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Introduction 


I wrote this text for a one semester course at the sophomore-junior level. Our experience with students taking 
our junior physics courses is that even if they've had the mathematical prerequisites, they usually need more experience 
using the mathematics to handle it efficiently and to possess usable intuition about the processes involved. If you’ve seen 
infinite series in a calculus course, you may have no idea that they're good for anything. If you've taken a differential 
equations course, which of the scores of techniques that you’ve seen are really used a lot? 

The world is (at least) three dimensional so you clearly need to understand multiple integrals, but will everything 
be rectangular? 

How do you learn intuition? 

When you've finished a problem and your answer agrees with the back of the book or with your friends or even a 
teacher, you’re not done. The way to get an intuitive understanding of the mathematics and of the physics is to analyze 
your solution thoroughly. Does it make sense? There are almost always several parameters that enter the problem, so 
what happens to your solution when you push these parameters to their limits? In a mechanics problem, what if one 
mass is much larger than another? Does your solution do the right thing? In electromagnetism, if you make a couple of 
parameters equal to each other does it reduce everything to a simple, special case? When you’re doing a surface integral 
should the answer be positive or negative and does your answer agree? 

When you address these questions to every problem you ever solve, you do several things. First, you'll find your 
own mistakes before someone else does. Second, you acquire an intuition about how the equations ought to behave and 
how the world that they describe ought to behave. Third, It makes all your later efforts easier because you will then have 
some clue about why the equations work the way they do. It reifies the algebra. 

Does it take extra time? Of course. It will however be some of the most valuable extra time you can spend. 

Is it only the students in my classes, or is it a widespread phenomenon that no one is willing to sketch a graph? 
(“Pulling teeth" is the cliche that comes to mind.) Maybe you've never been taught that there are a few basic methods 
that work, so look at section 1.8. And keep referring to it. This is one of those basic tools that is far more important 
than you’ve ever been told. It is astounding how many problems become simpler after you’ve sketched a graph. Also, 
until you’ve sketched some graphs of functions you really don't know how they behave. 

When I taught this course I didn’t do everything that I’m presenting here. The two chapters, Numerical Analysis 
and Tensors, were not in my one semester course, and I didn't cover all of the topics along the way. Several more chapters 
were added after the class was over, so this is now far beyond a one semester text. There is enough here to select from 
if this is a course text, but if you are reading it on your own then you can move through it as you please, though you 
will find that the first five chapters are used more in the later parts than are chapters six and seven. Chapters 8, 9, and 
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13 form a sort of package. I've tried to use examples that are not all repetitions of the ones in traditional physics texts 
but that do provide practice in the same tools that you need in that context. 

The pdf file that I've placed online is hyperlinked, so that you can click on an equation or section reference to go 
to that point in the text. To return, there’s a Previous View button at the top or bottom of the reader or a keyboard 
shortcut to do the same thing. [Command*- on Mac, Alt*- on Windows, Control*- on Linux-GNU] The index pages 
are hyperlinked, and the contents also appear in the bookmark window. 

I chose this font for the display versions of the text because it appears better on the screen than does the more 
common Times font. The choice of available mathematics fonts is more limited. 

I'd like to thank the students who found some, but probably not all, of the mistakes in the text. Also Howard 
Gordon, who used it in his course and provided me with many suggestions for improvements. Prof. Joseph Tenn of 
Sonoma State University has given me many very helpful ideas, correcting mistakes, improving notation, and suggesting 
ways to help the students. 


2008 

A change in notation in this edition: For polar and cylindrical coordinate systems it is common to use theta for the polar 
angle in one and phi for the polar angle in the other. I had tried to make them the same (6) to avoid confusion, but 
probably made it less rather than more helpful because it differed from the spherical azimuthal coordinate. In this edition 
all three systems (plane polar, cylindrical, spherical) use phi as (f) = tan ~ 1 (y/x). In line integrals it is common to use 
ds for an element of length, and many authors will use dS for an element of area. I have tried to avoid this confusion 
by sticking to dl and dA respectively (with rare exceptions). 

In many of the chapters there are “exercises" that precede the “problems.” These are supposed to be simpler and 
mostly designed to establish some of the definitions that appeared in the text. 

This text is now available in print from Dover Publishers. They have agreed that the electronic version will remain 
available online. 
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Basic Stuff 


1.1 Trigonometry 

The common trigonometric functions are familiar to you, but do you know some of the tricks to remember (or to derive 
quickly) the common identities among them? Given the sine of an angle, what is its tangent? Given its tangent, what 
is its cosine? All of these simple but occasionally useful relations can be derived in about two seconds if you understand 
the idea behind one picture. Suppose for example that you know the tangent of 8, what is sin#? Draw a right triangle 
and designate the tangent of # as x, so you can draw a triangle with tan 8 = x/l. 

The Pythagorean theorem says that the third side is y/l + x 2 . You now read the sine 
from the triangle as x/yjl + x 2 , so 

tan# 

+ tan 2 8 

Any other such relation is done the same way. You know the cosine, so what's the cotangent? Draw a different triangle 
where the cosine is x/l. 

Radians 

When you take the sine or cosine of an angle, what units do you use? Degrees? Radians? Cycles? And who invented 
radians? Why is this the unit you see so often in calculus texts? That there are 360° in a circle is something that you 
can blame on the Sumerians, but where did this other unit come from? 



It results from one figure and the relation between the radius of the circle, the angle drawn, and the length of the 
arc shown. If you remember the equation s = R8, does that mean that for a full circle 8 = 360° so s = 360/?? No. 
For some reason this equation is valid only in radians. The reasoning comes down to a couple of observations. You can 
see from the drawing that s is proportional to 8 — double 8 and you double s. The same observation holds about the 
relation between s and R, a direct proportionality. Put these together in a single equation and you can conclude that 

s = CR8 




l 
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where C is some constant of proportionality. Now what is Cl 

You know that the whole circumference of the circle is 2nR, so if 9 = 360°, then 

2ttR = CR360°, and C = degree -1 


It has to have these units so that the left side, s, comes out as a length when the degree units cancel. This is an 
awkward equation to work with, and it becomes very awkward when you try to do calculus. An increment of one in Ad 
is big if you're in radians, and small if you're in degrees, so it should be no surprise that Asin0/A# is much smaller in 
the latter units: 


sin 6 = cos 6 
d9 180 


in degrees 


This is the reason that the radian was invented. The radian is the unit designed so that the proportionality constant is 
one. 

C = 1 radian -1 then s = (l radian -1 )/?# 


In practice, no one ever writes it this way. It's the custom simply to omit the C and to say that s = R9 with 9 restricted 
to radians — it saves a lot of writing. How big is a radian? A full circle has circumference 2tiR, and this equals R9 
when you’ve taken C to be one. It says that the angle for a full circle has 27T radians. One radian is then 360/27T degrees, 
a bit under 60°. Why do you always use radians in calculus? Only in this unit do you get simple relations for derivatives 
and integrals of the trigonometric functions. 


Hyperbolic Functions 

The circular trigonometric functions, the sines, cosines, tangents, and their reciprocals are familiar, but their hyperbolic 
counterparts are probably less so. They are related to the exponential function as 


cosh a; = 

The other three functions are 


g* _j_ g x 


sinh x = 


tanh x = 


sinh x 
cosh a; 


r,X _ 

z x + e~ x 


sechrc = 


1 


cosh x ’ 


cscha; = 


1 


sinh x ’ 


coth x = 


1 


tanh a: 


( 1 . 1 ) 


Drawing these is left to problem 1.4, with a stopover in section 1.8 of this chapter. 

Just as with the circular functions there are a bunch of identities relating these functions. For the analog of 
cos 2 9 + sin 2 9 = 1 you have 

cosh 2 9 — sinh 2 9 = 1 (1.2) 
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For a proof, simply substitute the definitions of cosh and sinh in terms of exponentials and watch the terms cancel. 
(See problem 4.23 for a different approach to these functions.) Similarly the other common trig identities have their 
counterpart here. 

1 + tan 2 9 = sec 2 9 has the analog 1 — t. mih 2 0 = sech 2 0 (1.3) 

The reason for this close parallel lies in the complex plane, because cos(ix) = coshx and sin(ia:) = isinha;. See chapter 
three. 

The inverse hyperbolic functions are easier to evaluate than are the corresponding circular functions. I'll solve for 
the inverse hyperbolic sine as an example 


e x _ e ~x 

?/ = sinha; means x = sinh -1 ?/, y = , solve for x. 

Multiply by 2e x to get the quadratic equation 

2e x y = e 2x - 1 or (e*) 2 - 2y(e x ) - 1 = 0 

The solutions to this are e x = y± -Jy 2 + 1 , and because \Jy 2 + 1 is always greater than \y\, you must take the positive 
sign to get a positive e x . Take the logarithm of e x and 



x = sinh 1 y = In (y + \Jy 2 + l) 
(— oo < y < Too) 


As x goes through the values — oo to +oo, the values that sinh x takes on go over the range — oo to +oo. This implies 
that the domain of sinh -1 ?/ is — oo < y < +oo. The graph of an inverse function is the mirror image of the original 
function in the 45° line y = x, so if you have sketched the graphs of the original functions, the corresponding inverse 
functions are just the reflections in this diagonal line. 
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The other inverse functions are found similarly; see problem 1.3 


sinh 1 y = In {%/ + \Jy‘ 1 + l) 

cosh” 1 y = In [y ± sjy 1 ~ l) , y > 1 
1 + t/ 


tanh 1 y = - In , 
2 1 -y’ 

coth-'y = lln^±l, 


< 1 


> 1 


(1.4) 


The cosh” 1 function is commonly written with only the + sign before the square root. What does the other sign do? 
Draw a graph and find out. Also, what happens if you add the two versions of the cosh” 1 ? 

The calculus of these functions parallels that of the circular functions. 


d_ 

dx 


sinh x 


d e x - e~ x 
dx 2 


e x + e x 

= cosh x 


Similarly the derivative of cosha: is sinh a;. Note the plus sign here, not minus. 

Where do hyperbolic functions occur? If you have a mass in equilibrium, the total force on it is zero. If it’s in 
stable equilibrium then if you push it a little to one side and release it, the force will push it back to the center. If it is 
unstable then when it’s a bit to one side it will be pushed farther away from the equilibrium point. In the first case, it 
will oscillate about the equilibrium position and for small oscillations the function of time will be a circular trigonometric 
function — the common sines or cosines of time, Acosut. If the point is unstable, the motion will be described by 
hyperbolic functions of time, sinhoaf instead of sin cut. An ordinary ruler held at one end will swing back and forth, 
but if you try to balance it at the other end it will fall over. That's the difference between cos and cosh. For a deeper 
understanding of the relation between the circular and the hyperbolic functions, see section 3.3 


1.2 Parametric Differentiation 

The integration techniques that appear in introductory calculus courses include a variety of methods of varying usefulness. 
There's one however that is for some reason not commonly done in calculus courses: parametric differentiation. It’s best 
introduced by an example. 

/»oo 

/ x n e~ x dx 


J o 

You could integrate by parts n times and that will work. For example, n = 2: 



OO 

roo 

OO 

roo 

x 2 e x 

+ 

/ 2xe~ x dx = 0 - 2xe~ x 

+ 

/ 2e~ x dx = 0 - 2e~ x 


0 

Jo 

0 

Jo 


o 


1 — Basic Stuff 


5 


Instead of this method, do something completely different. Consider the integral 

e~ ax dx (1.5) 

It has the parameter a in it. The reason for this will be clear in a few lines. It is easy to evaluate, and is 





oo 

0 


1 

a 


Now differentiate this integral with respect to a, 


_d_ 

da 



d 1 
da a 


or 



-1 


a 


2 


And again and again: 
The n th derivative is 


2 _ ax , +2 
+ / x e dx = — , 
Jo « 3 


x 3 e ax dx 


-2-3 

a 4 


roo 

± / x n e~ ax dx 

Jo 


±n\ 

a n+l 


( 1 . 6 ) 


Set a = 1 and you see that the original integral is n\. This result is compatible with the standard definition for 0!. From 
the equation n\ = n ■( n — 1)!, you take the case n = 1, and it requires 0! = 1 in order to make any sense. This integral 
gives the same answer for n = 0. 

The idea of this method is to change the original problem into another by introducing a parameter. Then 
differentiate with respect to that parameter in order to recover the problem that you really want to solve. With a little 
practice you’ll find this easier than partial integration. Also see problem 1.47 for a variation on this theme. 

Notice that I did this using definite integrals. If you try to use it for an integral without limits you can sometimes 
get into trouble. See for example problem 1.42. 


1.3 Gaussian Integrals 

Gaussian integrals are an important class of integrals that show up in kinetic theory, statistical mechanics, quantum 
mechanics, and any other place with a remotely statistical aspect. 


J dxx n e ax 


2 
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The simplest and most common case is the definite integral from — oo to +oo or maybe from 0 to oo. 
If is a positive odd integer, these are elementary, 



[ dxx n e ax ~ = 0 (n odd) 

J — OO 


(1.7) 


To see why this is true, sketch graphs of the integrand for a few more odd n. 

For the integral over positive x and still for odd n, do the substitution t = ax 2 . 

P OO 1 poo 1 

/ dxx n e~ ax2 = — = — vr(( n ~ l)/ 2 )' (1.8) 

Jo 2a("+ 1 )/ 2 Jo 2 a^ 1 )/ 2 ^ ” ’ V ’ 

Because n is odd, (n — l)/2 is an integer and its factorial makes sense. 

If n is even then doing this integral requires a special preliminary trick. Evaluate the special case n = 0 and 
a = 1. Denote the integral by /, then 

r°° , / r°° ,\ / r°° 9 

I = / dxe~ x , and I 2 = I / dxe~ x j ( / dy e~ y 

J —oo \J —oo J \J — oo 


In squaring the integral you must use a different label for the integration variable in the second factor or it will get 
confused with the variable in the first factor. Rearrange this and you have a conventional double integral. 

/ oo poo 

dx / dye^ x2+y2 ^ 

-OO J — OO 


This is something that you can recognize as an integral over the entire x-y plane. Now the trick is to switch to polar 
coordinates*. The element of area dx dy now becomes rdrdcj), and the respective limits on these coordinates are 0 to 
oo and 0 to 27T. The exponent is just r 2 = x 2 + y 2 . 

poo p 2n 

I 2 = / r dr d(pe~ r2 
Jo Jo 


* See section 1.7 in this chapter 
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The (j) integral simply gives 27 r. For the r integral substitute r 2 = z and the result is 1/2. [Or use Eq. (1.8).] The two 
integrals together give you 7T. 


I 2 = 7T, 


SO 


dxe x = \/7r 


(1.9) 


Now do the rest of these integrals by parametric differentiation, introducing a parameter with which to carry out 

2 2 

the derivatives. Change e~ x to e~ ax , then in the resulting integral change variables to reduce it to Eq. (1.9). You get 



so 



dxx 2 e ax2 


d [W 1 / a/ 7 f \ 

day a ~ 2 J 


( 1 . 10 ) 


You can now get the results for all the higher even powers of x by further differentiation with respect to a. 

1.4 erf and Gamma 

What about the same integral, but with other limits? The odd-n case is easy to do in just the same way as when the 
limits are zero and infinity: just do the same substitution that led to Eq. (1.8). The even-n case is different because it 
can't be done in terms of elementary functions. It is used to define an entirely new function. 


erf(x) = 



x 0. 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 

erf 0. 0.276 0.520 0.711 0.843 0.923 0.967 0.987 0.995 


( 1 . 11 ) 


This is called the error function. It's well studied and tabulated and even shows up as a button on some* pocket 
calculators, right along with the sine and cosine. (Is erf odd or even or neither?) (What is erf(±oo)?) 

A related integral worthy of its own name is the Gamma function. 


T(®) = 


dtt^e- 1 


( 1 . 12 ) 


vu 

The special case in which a: is a positive integer is the one that I did as an example of parametric differentiation 
to get Eq. (1.6). It is 


T(n) = (n — 1)! 


* See for example rpncalculator (vl.96 the latest). It is the best desktop calculator that I've found (Mac and 
Windows). This main site seems (2008) to have disappeared, but I did find other sources by searching the web for 
the pair “rpncalculator” and baker. The latter is the author's name. I found mac.rbytes.net/cat/mac/scientific/rpn- 
calculator-x/ 
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The factorial is not defined if its argument isn’t an integer, but the Gamma function is perfectly well defined for 
any argument as long as the integral converges. One special case is notable: x = 1/2. 

poo POO POO 

T(l/2) = / = / 2u duu~ 1 e~ u2 = 2 / due~ u2 = y/ir (1.13) 

Jo Jo Jo 

I used t = u 2 and then the result for the Gaussian integral, Eq. (1.9). You can use parametric differentiation to derive a 
simple and useful recursion relation. (See problem 1.14 or 1.47.) 


xF(x) = T(x + 1) (1.14) 

From this you can get the value of T(l 1 / 2 ), T(2 1 /2), etc. In fact, if you know the value of the function in the interval 
between one and two, you can use this relationship to get it anywhere else on the axis. You already know that T(l) = 
1 = T(2). (You do? How?) As x approaches zero, use the relation T(x) = T(x + l)/x and because the numerator for 
small x is approximately 1, you immediately have that 

T(x) ~ 1/x for small x (1.15) 


The integral definition, Eq. (1.12), for the Gamma function is defined only for the case that x > 0. [The behavior 
of the integrand near t = 0 is approximately t x ~ l . Integrate this from zero to something and see how it depends on x.\ 
Even though the original definition of the Gamma function fails for negative x, you can extend the definition by using 
Eq. (1.14) to define T for negative arguments. What is T(— 1 / 2 ) for example? Put X = -y 2 in Eq. (1.14). 

— |r(-l/2) = T(-(l/2) + 1) = r(l/2) = so T(-l/2) = -2V5F (1.16) 


The same procedure works for other negative x, though it can take several integer steps to get to a positive value of x 
for which you can use the integral definition Eq. (1.12). 

The reason for introducing these two functions now is not that they are so much more important than a hundred 
other functions that I could use, though they are among the more common ones. The point is that the world doesn't end 
with polynomials, sines, cosines, and exponentials. There are an infinite number of other functions out there waiting for 
you and some of them are useful. These functions can’t be expressed in terms of the elementary functions that you’ve 
grown to know and love. They're different and have their distinctive behaviors. 
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There are zeta functions and Fresnel integrals and Legendre functions and Exponential integrals and Mathieu 
functions and Confluent Hypergeometric functions and . . . you get the idea. When one of these shows up, you learn to 
look up its properties and to use them. If you're interested you may even try to understand how some of these properties 
are derived, but probably not the first time that you confront them. That’s why there are tables, and the “Handbook 
of Mathematical Functions” by Abramowitz and Stegun is a premier example of such a tabulation, and it’s reprinted by 
Dover Publications. There's also a copy on the internet* www.math.sfu.ca/~cbm/aands/ as a set of scanned page 
images. 

Why erf? 

What can you do with this function? The most likely application is probably to probability. If you flip a coin 1000 times, 
you expect it to come up heads about 500 times. But just how close to 500 will it be? If you flip it twice, you wouldn't 
be surprised to see two heads or two tails, in fact the equally likely possibilities are 

TT HT TH HH 

This says that in 1 out of 2 2 = 4 such experiments you expect to see two heads and in 1 out of 4 you expect two tails. 
For just 2 out of 4 times you do the double flip do you expect exactly one head. All this is an average. You have to try 
the experiment many times to see your expectation verified, and then only by averaging many experiments. 

It’s easier to visualize the counting if you flip N coins at once and see how they come up. The number of coins 
that come up heads won’t always be N/ 2, but it should be close. If you repeat the process, flipping N coins again and 
again, you get a distribution of numbers of heads that will vary around N/ 2 in a characteristic pattern. The result is 
that the fraction of the time it will come up with k heads and N — k tails is, to a good approximation 

\[^ e ~ 2p / N ■> where 6 = k ~Y ( L17 ) 

The derivation of this can wait until section 2.6, Eq. (2.26). It is an accurate result if the number of coins that you flip 
in each trial is large, but try it anyway for the preceding example where N = 2. This formula says that the fraction of 
times predicted for k heads is 

k = 0: e~ x = 0.208 k = 1 = N/2 : 0.564 k = 2: 0.208 

The exact answers are 1/4, 2/4, 1/4, but as two is not all that big a number, the fairly large error shouldn't be distressing. 
If you flip three coins, the equally likely possibilities are 


now superceded by the online work dlmf.nist.gov/ at the National Institute of Standards and Technology 
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TTT TTH THT HTT THH HTH HHT HHH 

There are 8 possibilities here, 2 3 , so you expect (on average) one run out of 8 to give you 3 heads. Probability 1/8. 

To see how accurate this claim is for modest values, take N = 10. The possible outcomes are anywhere from zero 
heads to ten. The exact fraction of the time that you get k heads as compared to this approximation is 

k = 0 1 2 3 4 5 

exact: .000977 .00977 .0439 .117 .205 .246 

approximate: .0017 .0103 .0417 .113 .206 .252 

For the more interesting case of big N, the exponent, e -2<5 “/^, varies slowly and smoothly as <5 changes in integer 
steps away from zero. This is a key point; it allows you to approximate a sum by an integral. If N = 1000 and S = 10, 
the exponent is 0.819. It has dropped only gradually from one. For the same N = 1000, the fraction of the time to get 
exactly 500 heads is 0.025225, and this approximation is ^/2/l0007r =0.025231. 

Flip N coins, then do it again and again. In what fraction of the trials will the result be between N/2 — A 
and N/2 + A heads? This is the sum of the fractions corresponding to 5 = 0, S = ±1, . . . , S = ±A. Because the 
approximate function is smooth, I can replace this sum with an integral. This substitution becomes more accurate the 
larger N is. 


(IS 


/-A 


2 r -25 2 /N 
ttN 


Make the substitution 2 S 2 /N = x 2 and you have 


2 cA-y/ 2 /iV 

nN J-Ay/2/N 


N 2 1 r A V 2 / N 2 / 

—dxe~ x = -L / dx e~ x = erf (A\/2/N) 


( 1 . 18 ) 


The error function of one is 0.84, so if A = \J N/2 then in 84% of the trials heads will come up between N/2 — yj N/2 
and N/2 + yj N/2 times. For N = 1000, this is between 478 and 522 heads. 

1.5 Differentiating 

When you differentiate a function in which the independent variable shows up in several places, how do you carry out 
the derivative? For example, what is the derivative with respect to x of x x l The answer is that you treat each instance 
of x one at a time, ignoring the others; differentiate with respect to that x and add the results. For a proof, use the 
definition of a derivative and differentiate the function f(x,x). Start with the finite difference quotient: 

f{x + Ax, x + Ax) — f{x, x) 


Ax 
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f(x + Ax, x + Ax) — f(x, x + Ax) + f(x, x + Ax) — /(x, x) 

Ax 

f(x + Ax, x + Ax) — f(x, x + Ax) /(x, x + Ax) — f(x, x) 

Ax Ax 


(1.19) 


The first quotient in the last equation is, in the limit that Ax -4- 0, the derivative of / with respect to its first argument. 
The second quotient becomes the derivative with respect to the second argument. The prescription is clear, but to 
remember it you may prefer a mathematical formula. A notation more common in mathematics than in physics is just 
what's needed: 


j t W, t) = Dif (t, t) + D 2 f(t , t ) 


( 1 . 20 ) 


where D\ means “differentiate with respect to the first argument.” The standard “product rule” for differentiation is a 
special case of this. 

For example, 


d 

dx 



= e 



dtt 2 e- xt 2 


( 1 . 21 ) 


The resulting integral in this example is related to an error function, see problem 1.13, so it’s not as bad as it looks. 
Another example, 

d ... ... i d . ... . 

— x' =xx H — —k at k = x 

dx dx 

= xx 1 ^ 1 + = xx I_1 + In ke xlnk 

dx 

= x x + x x In x 


1.6 Integrals 

What is an integral? You’ve been using them for some time. I've been using the concept in this introductory chapter as 
if it's something that everyone knows. But what is it? 

If your answer is something like “the function whose derivative is the given function” or “the area under a curve" 
then No. Both of these answers express an aspect of the subject but neither is a complete answer. The first actually 
refers to the fundamental theorem of calculus, and I'll describe that shortly. The second is a good picture that applies to 
some special cases, but it won’t tell you how to compute it and it won’t allow you to generalize the idea to the many 
other subjects in which it is needed. There are several different definitions of the integral, and every one of them requires 
more than a few lines to explain. I’ll use the most common definition, the Riemann Integral. 
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An integral is a sum, obeying all the usual rules of addition and multiplication, such as 1+2+3+4 = (l+2) + (3+4) 
or5-(6+7) = (5-6) + (5-7). When you've read this section, come back and translate these bits of arithmetic into statements 
about integrals. 

A standard way to picture the definition is to try to find the area under a curve. You can get successively better 
and better approximations to the answer by dividing the area into smaller and smaller rectangles — ideally, taking the 
limit as the number of rectangles goes to infinity. 

To codify this idea takes a sequence of steps: 

1. Pick an integer N > 0. This is the number of subintervals into which the whole interval between a and b is to be 
divided. 


2. Pick N — 1 points between a and b. Call them X\, X 2 , etc. 

a = x 0 < xi < X 2 < ■ ■ ■ < xjv-i < X N = b 
and for convenience label the endpoints as Xq and Xjy. For the sketch , N = 8. 

3. Let Axfc = Xfc — Xf t _i. That is, 

Axi = xi-x 0 , Ax 2 = x 2 - xi, ■ ■ ■ 

4. In each of the N subintervals, pick one point at which the function will be evaluated. I'll label these points by the 
Greek letter £. (That's the Greek version of “x.”) 

x k— 1 — ‘Cfc — x k 

x o < £,1 < %i, Xi<£ 2 <x 2 ,--- 

5. Form the sum that is an approximation to the final answer. 

/(fi)Axi + /( 6 )Ax 2 + / (£3) Ax 3 + ■■■ 

6. Finally, take the limit as all the Ax ^ — > 0 and necessarily then, as N — > 00 . These six steps form the definition 

(1.22) 
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To demonstrate this numerically, pick a function and do the first five steps explicitly. Pick f(x) = l/x and 
integrate it from 1 to 2. The exact answer is the natural log of 2: In 2 = 0.69315. . . 

(1) Take N = 4 for the number of intervals 

(2) Choose to divide the distance from 1 to 2 evenly, at X\ = 1.25, x 2 = 1.5, x 3 = 1.75 

a = x 0 = 1. < 1.25 < 1.5 < 1.75 < 2. = X 4 = b 


(3) All the Ax's are equal to 0.25. 

(4) Choose the midpoint of each subinterval. This is the best choice when you use a finite number of divisions without 
taking the limit. 

fi = 1.125 £2 = 1-375 £3 = 1.625 £4 = 1-875 


(5) The sum approximating the integral is then 


/(fi)Axi + /(6 ) Ax 2 + /(&) Ax 3 + /(£ 4 )Ax 4 = 

1 __ 1 __ 1 __ 1 


1.125 


x .25 + 


1.375 


x .25 + 


1.625 


x .25 + 


1.875 


x .25 = .69122 


For such a small number of divisions, this is a very good approximation — about 0.3% error. (What do you get 
if you take N = 1 or = 2 or N = 10 divisions?) 


Fundamental Thm. of Calculus 

If the function that you're integrating is complicated or if the function is itself not known to perfect accuracy then a 
numerical approximation just like this one for j) 2 dx/x is often the best way to go. How can a function not be known 

completely? If it is experimental data. When you have to resort to this arithmetic way to do integrals, are there more 

efficient ways to do it than simply using the definition of the integral? Yes. That's part of the subject of numerical 

analysis, and there's a short introduction to the subject in chapter 11, section 11.4. 
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The fundamental theorem of calculus unites the subjects of differentiation and integration. The integral is defined 
as the limit of a sum, and the derivative is defined as the limit of a quotient of two differences. The relation between 
them is 

IF / has an integral from a to b, that is, if f f(x)dx exists, 

AND IF / has an anti-derivative, that is, there is a function F such that dF/dx = /, 


THEN 


f(x) dx = F(b) — F{a) 


(1.23) 


Are there cases where one of these exists without the other? Yes, though I'll admit that you are not likely to 
come across such functions without hunting through some advanced math books. Check out www.wikipedia.org for 
Volterra's function to see what it involves. 

Notice an important result that follows from Eq. (1.23). Differentiate both sides with respect to b 




(1.24) 


and with respect to a 

±jj{x)ix = -±F{a)=-Ha) (1.25) 

Differentiating an integral with respect to one or the other of its limits results in plus or minus the integrand. Combine 
this with the chain rule and you can do such calculations as 


d L 

dx 


e xi? dt = e xsm * x 


2 "cosx - e x °2x + 


t 2 e xt2 dt 


All this requires is that you differentiate every x that is present and add the results, just as 


d 9 d dx dx 

-;-x = x ■ x = x + x-r- = l- x + x-l = 2x 
dx dx dx dx 


(1.26) 


You may well ask why anyone would want to do such a thing as Eq. (1.26), but there are more reasonable examples that 
show up in real situations. I’ve already used this result in Eq. (1.21). 

Riemann-Stieltjes Integrals 

Are there other useful definitions of the word integral? Yes, there are many, named after various people who developed 
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them, with Lebesgue being the most famous. His definition* is most useful in much more advanced mathematical 
contexts, and I won't go into it here, except to say that very roughly where Riemann divided the x-axis into intervals 
ArCj, Lebesgue divided the y- axis into intervals Ay,; . Doesn’t sound like much of a change does it? It is. There is 
another definition that is worth knowing about, not because it helps you to do integrals, but because it unites a couple 
of different types of computation into one. This is the Riemann-Stieltjes integral. You won’t need it for any of the later 
work in this book, but it is a fairly simple extension of the Riemann integral and I'm introducing it mostly for its cultural 
value — to show you that there are other ways to define an integral. If you take the time to understand it, you will be 
able to look back at some subjects that you already know and to realize that they can be manipulated in a more compact 
form (e.g. center of mass). 

When you try to evaluate the moment of inertia you are doing the integral 

J r 2 dm 

When you evaluate the position of the center of mass even in one dimension the integral is 

T J xdm 

and even though you may not yet have encountered this, the electric dipole moment is 

j rdq 

How do you integrate x with respect to ml What exactly are you doing? A possible answer is that you can express this 
integral in terms of the linear density function and then dm = A (x)dx. But if the masses are a mixture of continuous 
densities and point masses, this starts to become awkward. Is there a better way? 

Yes 

On the interval a < x < b assume there are two functions, / and a. Don’t assume that either of them must be 
continuous, though they can’t be too badly behaved or nothing will converge. This starts the same way the Riemann 
integral does: partition the interval into a finite number (N) of sub-intervals at the points 

a = x o < Xi < X 2 < ■ ■ ■ < xjy = b (1-27) 


* One of the more notable PhD theses in history 
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Form the sum 

N 

Y1 f( x 'k) Aa k, where x k- 1 < x' k < x k and A a k = a(x k ) - a(or_i) (1.28) 

k= 1 

To improve the sum, keep adding more and more points to the partition so that in the limit all the intervals x k —x k _i — > 0. 
This limit is called the Riemann-Stieltjes integral, 

J f da (1-29) 

What’s the big deal? Doesn’t da = a'dxl Use that and you have just the ordinary integral 

J f(x)a'(x)dx ? 

Sometimes you can, but what if a isn’t differentiable? Suppose that it has a step or several steps? The derivative isn’t 
defined, but this Riemann-Stieltjes integral still makes perfectly good sense. 

An example. A very thin rod of length L is placed on the m-axis with one end at the origin. It has a uniform 
linear mass density A and an added point mass m o at x = 3L/4. (a piece of chewing gum?) Let m(x) be the function 
defined as 

rn(x) = (the amount of mass at coordinates < x) 

f Ax (0 < x < 3L/4) 

[Xx + mo (3L/4:<x<L) 

This is of course discontinuous. 



The coordinate of the center of mass is f xdm/ f dm. The total mass in the denominator is mo + XL, and I’ll 
go through the details to evaluate the numerator, attempting to solidify the ideas that form this integral. Suppose you 
divide the length L into 10 equal pieces, then 


x k = kL/ 10, (k = 0, 1, ... , 10) 


and A m k 


XL/10 (k^ 8) 

XL/ 10 + mo ( k = 8) 
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A mg = m(x s) — m(x 7) = (Axs + mo) — Ax 7 = XL/lQ + mo- 

Choose the positions x\ anywhere in the interval; for no particular reason I'll take the right-hand endpoint, 
x' k = kL / 10. The approximation to the integral is now 

10 7 10 

J2 x 'k Am k = J2 x 'k XL / 10 + Xg(AL/lO + mo) + y^x^AL/lO 

k = 1 k = 1 fc=9 

10 

= ^x' fc AL/lO + x§ m 0 
fc=i 


As you add division points (more intervals) to the whole length this sum obviously separates into two parts. One is the 
ordinary integral and the other is the discrete term from the point mass. 

L 

xA dx + mo3L/4 = XL 2 / 2 + mo3L/4 


The center of mass is then at 


-^cm 


XL 2 / 2 + mo3L/4 
mo + XL 


If mo XL, this is approximately L/2. In the reverse case is is approximately 3L/4. Both are just what you should 
expect. 

The discontinuity in m(x) simply gives you a discrete added term in the overall result. 

Did you need the Stieltjes integral to do this? Probably not. You would likely have simply added the two terms 
from the two parts of the mass and gotten the same result as with this more complicated method. The point of this 
is not that it provides an easier way to do computations. It doesn't. It is however a unifying notation and language 
that lets you avoid writing down a lot of special cases. (Is it discrete? Is it continuous?) You can even write sums as 
integrals: Let cr be a set of steps: 


H — I — I — I — h 


a(x) 


0 x < 1 

1 1 < x < 2 

2 2 < x < 3 

etc. 


= [x] for x > 0 
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Where that last bracketed symbol means “greatest integer less than or equal to x." It's a notation more common in 
mathematics than in physics. Now in this notation the sum can be written as a Stieltjes integral. 





(1.30) 


k=l 


At every integer, where [x] makes a jump by one, there is a contribution to the Riemann-Stieltjes sum, Eq. (1.28). That 
makes this integral just another way to write the sum over integers. This won't help you to sum the series, but it is 
another way to look at the subject. 

The method of integration by parts works perfectly well here, though as with all the rest of this material I'll leave 
the proof to advanced calculus texts. If J f da exists then so does f a df and 



(1.31) 


This relates one Stieltjes integral to another one, and because you can express summation as an integral now, you can 


even do summation by parts on the equation (1.30). That's something that you are not likely to think of if you restrict 


yourself to the more elementary notation, and it’s even occasionally useful. 

1.7 Polar Coordinates 

When you compute an integral in the plane, you need the element of area appropriate to the coordinate system that 
you're using. In the most common case, that of rectangular coordinates, you find the element of area by drawing the 


two lines at constant coordinates x and x + dx. Then you draw the two lines at constant coordinates y and y + dy. 
The little rectangle that they circumscribe has an area dA = dxdy. 


y + dy 

V 



0 + dcf) 


x x + dx 


In polar coordinates you do exactly the same thing! The coordinates are r and 0, and the line at constant radius 
r and at constant r + dr define two neighboring circles. The lines at constant angle 0 and at constant angle 0 + d(p 


1 — Basic Stuff 


19 


form two closely spaced rays from the origin. These four lines circumscribe a tiny area that is, for small enough dr and 
d(j), a rectangle. You then know its area is the product of its two sides*: dA = ( dr)(rdq L >). This is the basic element of 
area for polar coordinates. 

The area of a circle is the sum of all the pieces of area within it 


rR 


r2n 


dA = 


r dr 


d(f> 


I find it more useful to write double integrals in this way, so that the limits of integration are next to the differential. The 
other notation can put the differential a long distance from where you show the limits of integration. I get less confused 
my way. In either case, and to no one's surprise, you get 


rR r2lt rR 

/ r dr d(j)= r dr 2it = 2ttR 2 /2 = nR 2 

Jo Jo Jo 


For the preceding example you can do the double integral in either order with no special care. If the area over 
which you're integrating is more complicated you will have to look more closely at the limits of integration. I’ll illustrate 
with an example of this in rectangular coordinates: the area of a triangle. Take the triangle to have vertices (0, 0), (a, 0), 
and (0, b). The area is 



• pa pb(a—x)/a 

dA = I dx I dy 

Jo Jo 


rb r a (b—y)/b 

or dy dx 

Jo Jo 


(1.32) 


They should both yield abj 2. See problem 1.25. 

1.8 Sketching Graphs 

How do you sketch the graph of a function? This is one of the most important tools you can use to understand the 
behavior of functions, and unless you practice it you will find yourself at a loss in anticipating the outcome of many 
calculations. There are a handful of rules that you can follow to do this and you will find that it's not as painful as you 
may think. 

You are confronted with a function and have to sketch its graph. 


* If you're tempted to say that the area is dA = drdcj), look at the dimensions. This expression is a length, not an 


area. 
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1. What is the domain? That is, what is the set of values of the independent variable that you need to be 
concerned with? Is it — oo to Too or is it 0 < x < L or is it — 7t < (f) < 7r or what? 

2. Plot any obvious points. If you can immediately see the value of the function at one or more points, do them 
right away. 

3. Is the function even or odd? If the behavior of the function is the same on the left as it is on the right (or 
perhaps inverted on the left) then you have half as much work to do. Concentrate on one side and you can then make 
a mirror image on the left if it is even or an upside-down mirror image if it's odd. 

4. Is the function singular anywhere? Does it go to infinity at some point where the denominator vanishes? Note 
these points on the axis for future examination. 

5. What is the behavior of the function near any of the obvious points that you plotted? Does it behave like xl 
Like x 2 ? If you concluded that it is even, then the slope is either zero or there's a kink in the curve, such as with the 
absolute value function, |x|. 

6. At one of the singular points that you found, how does it behave as you approach the point from the right? 
From the left? Does the function go toward +oo or toward — oo in each case? 

7. How does the function behave as you approach the ends of the domain? If the domain extends from — oo to 
Too, how does the function behave as you approach these regions? 

8. Is the function the sum or difference of two other much simpler functions? If so, you may find it easier to 
sketch the two functions and then graphically add or subtract them. Similarly if it is a product. 

9. Is the function related to another by translation? The function f(x ) = (x — 2) 2 is related to x 2 by translation 
of 2 units. Note that it is translated to the right from x 2 . You can see why because (x — 2) 2 vanishes at x = Y2. 

10. After all this, you will have a good idea of the shape of the function, so you can interpolate the behavior 
between the points that you’ve found. 

Example: sketch f(x) = x/(a 2 — x 2 ). 


—a a 

1. The domain for independent variable wasn’t given, so take it to be — oo < x < oo 

2. The point x = 0 obviously gives the value /( 0) = 0. 

4. The denominator becomes zero at the two points x = To. 

3. If you replace x by —x, the denominator is unchanged, and the numerator changes sign. The function is odd 
about zero. 
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7. When x becomes very large (|at| 3> a), the denominator is mostly — x 2 , so f(x) behaves like x/ (—x 2 ) = — l/x 
for large x. It approaches zero for large x. Moreover, when x is positive, it approaches zero through negative values and 
when x is negative, it goes to zero through positive values. 



5. Near the point x = 0, the x 2 in the denominator is much smaller than the constant a 2 (x 2 <C a 2 ). That means 
that near this point, the function / behaves like x/a 2 



6. Go back to the places that it blows up, and ask what happens near there. If x is a little greater than a, the 
x 2 in the denominator is a little larger than the a 2 in the denominator. This means that the denominator is negative. 
When a: is a little less than a, the reverse is true. Near x = a, The numerator is close to a. Combine these, and you see 
that the function approaches — oo as x — > a from the right. It approaches +oo on the left side of a. I've already noted 
that the function is odd, so don't repeat the analysis near x = —a, just turn this behavior upside down. 

With all of these pieces of the graph, you can now interpolate to see the whole picture. 

OR, if you’re clever with partial fractions, you might realize that you can rearrange / as 

x _ -1/2 -1/2 

a 2 — x 2 x — a x + a ’ 
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and then follow the ideas of techniques 8 and 9 to sketch the graph. It's not obvious that this is any easier; it's just 
different. 


Exercises 

1 Express e x in terms of hyperbolic functions. 

2 If sinh x = 4/3, what is cosh a;? What is tanh x? 

3 If tanh x = 5/13, what is sinh x? What is cosh a:? 

4 Let n and m be positive integers. Let a = n 2 — m 2 , b = 2 nm, c = n 2 + m 2 . Show that a-b-c form the integer sides 
of a right triangle. What are the first three independent “Pythagorean triples?” By that I mean ones that aren’t just a 
multiple of one of the others. 

5 Evaluate the integral J Q “ dx x 2 cos a;. Use parametric differentiation starting with cos ax. 

6 Evaluate J Q a dx x sinh a; by parametric differentiation. 

7 Differentiate xe x sin x cosh x with respect to x. 

2 

8 Differentiate f Q dt sin (xt) with respect to x. 

9 Differentiate die~ xl ' with respect to x. 

10 Differentiate j// : dt sin(a ;f 3 ) with respect to x. 

11 Differentiate ^ e ~ atA Jo(fit) with respect to x. Jo is a Bessel function. 

12 Sketch the function y = Vq t — gt 2 / 2 . (First step: set all constants to one. Vo = g = 2 = 1. Except exponents) 

13 Sketch the function U = —mgy + ky 2 /2. (Again: set the constant factors to one.) 

14 Sketch U = mg 1(1 — cos 6). 

15 Sketch V = -Vo e~ x2 l a \ 
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16 Sketch x = Xoe~ at sin tut. 

17 Is it all right in Eq. (1.22) to replace “Ax^ — > 0” with “N — > oo?” [No.] 

18 Draw a graph of the curve parametrized as x = cos 9, y = sin0. 

Draw a graph of the curve parametrized as x = cosh#, y = sinh 0. 

19 What is the integral J^dxe~ x2 l 

20 G iven that dx/ (1+x 2 ) = n, i.e. you don’t have to derive this, what then is dxj (ct+x 2 )? Now differentiate 

the result and find the two integrals dx/ (1 + x 2 ) 2 and J'//^ dx/ (1 + x 2 ) 3 . 

21 Derive the product rule as a special case of Eq. (1.20). 

22 The third paragraph of section 1.6 has two simple equations in arithmetic. What common identities about the 
integral do these correspond to? 

23 Plot a graph of y = e x with y and x in meters (x horizontal and y vertical). Start at the origin and walk along the 
x-axis at one meter per second. When you are at the 20-meter point, where is the y coordinate and how fast is it rising? 
Not just numbers: compare both to real things. 
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Problems 

1.1 What is the tangent of an angle in terms of its sine? Draw a triangle and do this in one line. 

1.2 Derive the identities for cosh 2 6 — sinh 2 9 and for 1 — tanh 2 0, Equation (1.3). 

1.3 Derive the expressions in Eq. (1.4) for cosh -1 ?/, tanh -1 ?/, and coth -1 y. Pay particular attention to the domains 
and explain why these are valid for the set of y that you claim. What is sinh -1 (?/) + sinh _1 (— y)l 


1.4 The inverse function has a graph that is the mirror image of the original function in the 45° line y = x. Draw the 
graphs of all six of the hyperbolic functions and all six of the inverse hyperbolic functions, comparing the graphs you 
should get to the functions derived in the preceding problem. 

1.5 Evaluate the derivatives of coshx, tanhx, and cothx. 


1.6 What are the derivatives, rfsinh 1 y / dy and d cosh 1 y / dyl 

1.7 Find formulas for sinh 2y and cosh 2 y in terms of hyperbolic functions of y. The first one of these should take just 
a couple of lines. Maybe the second one too, so if you find yourself filling a page, start over. 

1.8 Do a substitution to evaluate the integral (a) simply. Now do the same for (b) 


dt 


Va^C 2 


dt 


yj a 2 + t 2 


1.9 Sketch the two integrands in the preceding problem. For the second integral, if the limits are 0 and z with z^> a, 
then before having done the integral, estimate approximately what the value of this integral should be. (Say z = 10 6 a 
or z = 10 60 a.) Compare your estimate to the exact answer that you just found to see if they match in any way. 

1.10 Fill in the steps in the derivation of the Gaussian integrals, Eqs. (1.7), (1.8), and (1.10). In particular, draw graphs 
of the integrands to show why Eq. (1.7) is so. 

1.11 What is the integral dtt n e if n = —1 or n = —21 [Careful!, no conclusion-jumping allowed.] Did you 
draw a graph? No? Then that's why you’re having trouble with this. 
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1.12 Sketch a graph of the error function. In particular, what is its behavior for small x and for large x, both positive 
and negative? Note: “small” doesn’t mean zero. First draw a sketch of the integrand and from that you can 
(graphically) estimate erf(:r) for small x. Compare this to the short table in Eq. (1.11). 

1.13 Put a parameter a into the defining integral for the error function, Eq. (1.11), so it has f x dte~ at2 . Differentiate 
this integral with respect to a. Next, change variables in this same integral from t to u: u 2 = at 2 , and differentiate 
that integral (which of course has the same value as before) with respect to alpha to show 

= ^^erf(x) - ^xe ~ x2 

As a check, does this agree with the previous result for x = oo, Eq. (1.10)? 

1.14 Use parametric differentiation to derive the recursion relation xT(x) = T(x + 1). Do it once by inserting a 
parameter in the integral for T, e — > e~ at , and differentiating. Then change variables before differentiating and equate 
the results. 

1.15 What is the Gamma function of x = —1/2, —3/2, —5/2? Explain why the original definition of T in terms of 
the integral won't work here. Demonstrate why Eq. (1.12) converges for all x > 0 but does not converge for x < 0. 
Ans: T(-5/2) = -8^/15 

1.16 What is the Gamma function for x near 1? nearO? near — 1? —2? —3? Now sketch a graph of the Gamma function 
from —3 through positive values. Try using the recursion relation of problem 1.14. Ans: Near —3, T(x) « — 1/ (6(a; + 3)) 

1.17 Show how to express the integral for arbitrary positive x 

dt t x e~ t2 

in terms of the Gamma function. Is positive x the best constraint here or can you do a touch better? 

Ans: lr((x + 1) / 2) 

1.18 The derivative of the Gamma function at x = 1 is r'(l) ~ —0.5772 = —7. The number 7 is called Euler's constant, 
and like n or e it's another number that simply shows up regularly. What is T , (2)? What is T'(3)? Ans: T 7 (3) = 3 — 27 
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1.19 Show that 

r(n+ 1/2) = ^( 2 n- 1)!! 

The "double factorial” symbol mean the product of every other integer up to the given one. E.g. 5!! = 15. The double 
factorial of an even integer can be expressed in terms of the single factorial. Do so. What about odd integers? 


1.20 Evaluate this integral. Just find the right substitution. 



(a > 0) 


1.21 A triangle has sides a, b, c, and the angle opposite c is 7. Express the area of the triangle in terms of a, b, and 
7. Write the law of cosines for this triangle and then use sin 2 7 + cos 2 7 = 1 to express the area of a triangle solely in 
terms of the lengths of its three sides. The resulting formula is not especially pretty or even clearly symmetrical in the 
sides, but if you introduce the semiperimeter, s = (a + b + c)/ 2, you can rearrange the answer into a neat, symmetrical 
form. Check its validity in a couple of special cases. Ans: y/s(.s — a)(s — b)(s — c) (Hero's formula) 


1.22 An arbitrary linear combination of the sine and cosine, Asin0 + Bcosd, is a phase-shifted cosine: Ccos(0 + S). 
Solve for C and 5 in terms of A and B, deriving an identity in 6. 


1.23 Solve the two simultaneous linear equations 


ax + by = e, cx + dy = f 


and do it solely by elementary manipulation (+, — , x, -4-), not by any special formulas. Analyze all the qualitatively 

different cases and draw graphs to describe each. In every case, how many if any solutions are there? Because of 

its special importance later, look at the case e = f = 0 and analyze it as if it’s a separate problem. You should be 
able to discern and to classify the circumstances under which there is one solution, no solution, or many solutions. 

Ans: Sometimes a unique solution. Sometimes no solution. Sometimes many solutions. Draw two lines in the plane; 

how many qualitatively different pictures are there? 

1.24 Use parametric differentiation to evaluate the integral f x 2 sin xdx. Find a table of integrals if you want to verify 
your work. 


1.25 Derive all the limits on the integrals in Eq. (1.32) and then do the integrals. 

1.26 Compute the area of a circle using rectangular coordinates, 
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1.27 (a) Compute the area of a triangle using rectangular coordinates, so dA = dxdy. Make it a right triangle with 
vertices at (0,0), (a, 0), and ( a,b ). (b) Do it again, but reversing the order of integration, (c) Now compute the area of 
this triangle using polar coordinates. Examine this carefully to see which order of integration makes the problem easier. 


1.28 Start from the definition of a derivative, lim (f(x + Ax) — f(x)) / Ax, and derive the chain rule. 

Now pick special, fairly simple cases for g and h to test whether your result really works. That is, choose functions so 
that you can do the differentiation explicitly and compare the results, but also functions with enough structure that they 
aren't trivial. 

1.29 Starting from the definitions, derive how to do the derivative, 

d ffW 


d r^ x > 

dxi 3it)dt 


Now pick special, fairly simple cases for / and g to test whether your result really works. That is, choose functions so 
that you can do the integration and differentiation explicitly, but ones such the result isn't trivial. 

1.30 Sketch these graphs, working by hand only, no computers: 

.2 


x 


x * 


x 


x - a 


a 2 + x 2 ’ 


a 2 — x 2 ’ 


a 3 + x 3 ’ a 2 — (x — a ) 2 ’ 


x x 

L 2 -x 2 + L 


1.31 Sketch by hand only, graphs of 

sin x (—3 ti < x < +47t), 

1.32 Sketch by hand only, graphs of 

1 


sm x 


(— 37t < x < +47r), sin(x — n/2) (— 37T < x < +47 t) 


f{4>) = 1 + 7>sin 2 0 (0 < ({> < 2tt), /(</>) = 


/ (x) = 


X- 


(0 < x < a) 


(x — 2a) 2 (a < x < 2a) ’ 


fir) = 


j) (0 < <f> < 7t) 

— 27 r ( 7 t < cj) < 2rr) 

Kr/R 3 (0 < r < R) 
K/r 2 (R < r < 00 ) 
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1.33 From the definition of the Riemann integral make a numerical calculation of the integral 


l 


dx 


4 

1 + x 2 


Use 1 interval, then 2 intervals, then 4 intervals. If you choose to write your own computer program for an arbitrary 
number of intervals, by all means do so. As with the example in the text, choose the midpoints of the intervals to 
evaluate the function. To check your answer, do a trig substitution and evaluate the integral exactly. What is the % 
error from the exact answer in each case? [100x(wrong — right) / right] Ans: it 


1.34 Evaluate erf(l) numerically. Use 4 intervals. Ans: 0.842700792949715 (more or less) 

1.35 Evaluate dx sin x/x numerically. Ans: 1.85193705198247 or so. 

1.36 x and y are related by the equation x 3 — 4 xy + 3 y 3 = 0. You can easily check that (x,y) = (1, 1) satisfies it, 
now what is dy/dx at that point? Unless you choose to look up and plug in to the cubic formula, I suggest that you 
differentiate the whole equation with respect to x and solve for dy/dx. 

Generalize this to finding dy/dx if f(x,y ) = 0. Ans: 1/5 

1.37 When flipping a coin N times, what fraction of the time will the number of heads in the run lie between {N/2 — 
2 \J N / 2) and {N/2 + 2^/ N / 2) ? What are these numbers for N = 1000? Ans: 99.5% 

1.38 For N = 4 flips of a coin, count the number of times you get 0, 1, 2, etc. heads out of 2 4 = 16 cases. Compare 
these results to the exponential approximation of Eq. (1.17). 

Ans: 2 -> 0.375 and 0.399 

1.39 Is the integral of Eq. (1.17) over all 5 equal to one? 

1.40 If there are 100 molecules of a gas bouncing around in a room, about how long will you have to wait to find that 
all of them are in the left half of the room? Assume that you make a new observation every microsecond and that the 
observations are independent of each other. Ans: A million times the age of the universe. [Care to try 10 23 molecules?] 

1.41 If you flip 1000 coins 1000 times, about how many times will you get exactly 500 heads and 500 tails? What if 
it's 100 coins and 100 trials, getting 50 heads? Ans: 25, 8 
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1.42 (a) Use parametric differentiation to evaluate f xdx. Start with f e ax dx. Differentiate and then let a — > 0. 

(b) Now that the problem has blown up in your face, change the integral from an indefinite to a definite integral such 
as fl’ and do it again. There are easier ways to do this integral, but the point is that this method is really designed for 
definite integrals. It may not work on indefinite ones. 

1.43 The Gamma function satisfies the identity 

T(a;)r(l — x) = 7t/ sin7ra: 

What does this tell you about the Gamma function of 1/2? What does it tell you about its behavior near the negative 
integers? Compare this result to that of problem 1.16. 

1.44 Start from the definition of a derivative, manipulate some terms: (a) derive the rule for differentiating the function 
h, where h(x) = f(x)g(x) is the product of two other functions. 

(b) Integrate the resulting equation with respect to x and derive the formula for integration by parts. 

1.45 Show that in polar coordinates the equation r = 2a cos (j) is a circle. Now compute its area in this coordinate 
system. 

1.46 The cycloid* has the parametric equations x = ad — asind, and y = a — acosd. Compute the area, f y dx 
between one arc of this curve and the ;r-axis. Ans: 3na 2 

1.47 An alternate approach to the problem 1.13: Change variables in the integral definition of erf to t = au. Now 
differentiate with respect to a and of course the derivative must be zero and there’s your answer. Do the same thing 
for problem 1.14 and the Gamma function. 

1.48 Recall section 1.5 and compute this second derivative to show 

d? 

^2 j dt> (* - W) = m 

1.49 From the definition of a derivative show that 

If x = f(6) and t = g(6) then 


dx df/d9 
dt dg/dd 


www-groups.dcs.st-and.ac.uk/~history/Curves/Cycloid.html 
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Make up a couple of functions that let you test this explicitly. 

1.50 Redo problem 1.6 another way: x = sinh _1 y y = sinhic. Differentiate the second of these with respect to y 
and solve for dx/dy. Ans: dsinh -1 y/dy = l/i/l + y 2 . 


Infinite Series 

Infinite series are among the most powerful and useful tools that you've encountered in your introductory calculus course. 
It's easy to get the impression that they are simply a clever exercise in manipulating limits and in studying convergence, 
but they are among the majors tools used in analyzing differential equations, in developing methods of numerical analysis, 
in defining new functions, in estimating the behavior of functions, and more. 

2.1 The Basics 

There are a handful of infinite series that you should memorize and should know just as well as you do the multiplication 
table. The first of these is the geometric series, 


^ 1 

1 + x + x 2 + x 3 + x 4 H = x n = for lad < 1. (2.1) 

o 1 ~ x 

It's very easy derive because in this case you can sum the finite form of the series and then take a limit. Write the series 
out to the term x N and multiply it by (1 —x). 

(1 + x + x 2 + x 3 H 1- x N )(l - x) = 

(1 + x + x 2 + x 3 H b x N ) — (x + x 2 + x 3 + x 4 H b x N+1 ) = 1 - x N+l (2.2) 

If |a;| < 1 then as N — > oo this last term, x^ +1 , goes to zero and you have the answer. If x is outside this domain the 
terms of the infinite series don’t even go to zero, so there’s no chance for the series to converge to anything. 

The finite sum up to x N is useful on its own. For example it’s what you use to compute the payments on a loan 
that’s been made at some specified interest rate. You use it to find the pattern of light from a diffraction grating. 


N 


E 1 ” 

0 


1 - X N+1 


1 ~ X 


Some other common series that you need to know are power series for elementary functions: 


r 2 

e x = i + x + — + ■■■ 


OO 


E 



k\ 


( 2 . 3 ) 


31 



2 — Infinite Series 


32 


x" 

sin x = x — — + 


x 

cos x = 1 — — 

2 S 

ln(l + x) = x- y + y - 


(1 + x) a = 1 + ax + 


a (a — l)a ; 2 

2 ! 


00 ~2fc+l 

\fe x 


£(-D' 


0 

oo 


(2fc + l)! 


E(-d 


* ^ 


2fc 


0 

oo 


(2 k)\ 


^(-i) fc+1 y (1*1 < l) 


(2.4) 


1 

OO 


^ a{a-l)---{a-k + l) xk (|x| < 1} 


k = 0 


Of course, even better than memorizing them is to understand their derivations so well that you can derive them 
as fast as you can write them down. For example, the cosine is the derivative of the sine, so if you know the latter series 
all you have to do is to differentiate it term by term to get the cosine series. The logarithm of (1 + *) is an integral 
of 1/(1 + x) so you can get its series from that of the geometric series. The geometric series is a special case of the 
binomial series for a = —1, but it's easier to remember the simple case separately. You can express all of them as special 
cases of the general Taylor series. 

What is the sine of 0.1 radians? Just use the series for the sine and you have the answer, 0.1, or to more accuracy, 
0.1 - 0.001/6 = 0.099833 

What is the square root of 1.1? \Jl.l = (1 + .l) 1 / 2 = 1 + | ■ 0.1 = 1.05 

What is 1/1.9? 1/(2 — .1) = 1/ [2(1 — .05)] = |(1 + .05) = .5 + .025 = .525 from the first terms of the geometric 

series. 

What is ^1024? ^1024 = ^1000 + 24 = {/1000(1 + 24/1000) = 

10(1 + 24/1000) 1 / 3 = 10(1 + 8/1000) = 10.08 

As you see from the last two examples you have to cast the problem into a form fitting the expansion that you 
know. When you want to use the binomial series, rearrange and factor your expression so that you have 

(l + something small) 0 


2.2 Deriving Taylor Series 

How do you derive these series? The simplest way to get any of them is to assume that such a series exists and then to 
deduce its coefficients in sequence. Take the sine for example, assume that you can write 

sin x = A + Bx + Cx 2 + Dx 3 + Ex 4 H 
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Evaluate this at x = 0 to get 

sinO = 0 = A + BO + CO 2 + -DO 3 + E O 4 + • • ■ = A 

so the first term, A = 0. Now differentiate the series, getting 

cos x = B + 2 Cx + 3 Dx 2 + 4 Ex 3 4 

Again set x = 0 and all the terms on the right except the first one vanish. 

cos 0 = 1 = B + 2C0 + 3-DO 2 + 4-DO 3 4 = B 

Keep repeating this process, evaluating in turn all the coefficients of the assumed series. 


sin x = A 4- Bx + Cx 2 4- Dx 3 4- Ex 4 4- 
cosx = B 4- 2 Cx + 3 Dx 2 + 4 Ex 3 4 

— sinx = 2C 4- 6 Dx 4- 12 Ex 2 4 

- cos x = 6.D 4- 24 Ex + 60 Fx 2 4 

sin a; = 24.D 4- 120 Fx 4- ■ ■ ■ 

cos a; = 120F 4 


sin 0 = 0 = A 
cos 0 = 1 = 5 

— sin 0 = 0 = 2 C 

— cos 0 = — 1 = 6 D 
sin 0 = 0 = 24 E 
cosO = 1 = 120-F 


This shows the terms of the series for the sine as in Eq. (2.4). 

Does this show that the series converges? If it converges does it show that it converges to the sine? No to both. 
Each statement requires more work, and I’ll leave the second one to advanced calculus books. Even better, when you 
understand the subject of complex variables, these questions about series become much easier to understand. 

The generalization to any function is obvious. You match the coefficients in the assumed expansion, and get 

m = /( 0 ) + xf{ 0 ) + -p"(0) + ^/"'( 0 ) + ^/""( 0 ) + . . . 

You don’t have to do the expansion about the point zero. Do it about another point instead. 

m = /(to) + (t - io)/'(t <>) + +■■■ 


2 ! 


(2.5) 
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What good are infinite series? 

This is sometimes the way that a new function is introduced and developed, typically by determining a series solution to 
a new differential equation. (Chapter 4) 

This is a tool for the numerical evaluation of functions. 

This is an essential tool to understand and invent numerical algorithms for integration, differentiation, interpolation, and 
many other common numerical methods. (Chapter 11) 

To understand the behavior of complex-valued functions of a complex variable you will need to understand these series 
for the case that the variable is a complex number. (Chapter 14) 

All the series that I’ve written above are power series (Taylor series), but there are many other possibilities. 

oo 1 

i 


The first is a Dirichlet series defining the Riemann zeta function, a function that appears in statistical mechanics among 
other places. 

The second is an example of a Fourier series. See chapter five for more of these. 

Still another type of series is the Frobenius series, useful in solving differential equations: its form is Ylk a k x k+s . The 
number s need not be either positive or an integer. Chapter four has many examples of this form. 

There are a few technical details about infinite series that you have to go through. In introductory calculus courses 
there can be a tendency to let these few details overwhelm the subject so that you are left with the impression that 
that’s all there is, not realizing that this stuff is useful. Still, you do need to understand it.* 

2.3 Convergence 

Does an infinite series converge? Does the limit as N — > oo of the sum, Yli u \ fc> exist? There are a few common and 
useful ways to answer this. The first and really the foundation for the others is the comparison test. 

Let u k and v k be sequences of real numbers, positive at least after some value of k. Also assume that for all k 
greater than some finite value, u k < v k . Also assume that the sum, Yhk v k does converge. 

The other sum, ^2 k u k then converges too. This is almost obvious, but it's worth the little effort that a proof takes. 

* For animations showing how fast some of these power series converge, check out 
www.physics.miami.edu/nearing/mathmethods/power-animations.html 


4 L 2 

7 T 2 


OO 


cos 


nnrx 

\~r 


(-L < x < L) 


( 2 . 6 ) 

(2.7) 
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The required observation is that an increasing sequence of real numbers, bounded above, has a limit. 

After some point, k = M, all the and v ^ are positive and < v^. The sum a n = YUm v k then forms an 
increasing sequence of real numbers, so by assumption this has a limit (the series converges). The sum b n = YYYvI u k ' s 
an increasing sequence of real numbers also. Because Uj,. < Vj,. you immediately have b n < a n for all n. 

bn T T lbn Q, n 

n— >oo 

this simply says that the increasing sequence b n has an upper bound, so it has a limit and the theorem is proved. 

Ratio Test 

To apply this comparison test you need a stable of known convergent series. One that you do have is the geometric 
series, Ylk x k for | < 1. Let this x k be the of the comparison test. Assume at least after some point k = K that 
all the > 0. 

Also that Wfc +1 < xu^. 


Then «^- +2 < xu^ + i and Ur+i < xur gives Ur +2 < x 2 Ur 


You see the immediate extension is 

U K+n < X n U K 

As long as x < 1 this is precisely set up for the comparison test using Yin u K xU as the series that dominates the Y2, n Un - 
This test, the ratio test is more commonly stated for positive Uj,, as 

If for large k, fe+1 < x < 1 then the series converges (2-8) 

u k 

This is one of the more commonly used convergence tests, not because it’s the best, but because it’s simple and it works 
a lot of the time. 

Integral Test 

The integral test is another way to check for convergence or divergence. If / is a decreasing positive function and you 
want to determine the convergence of Y^ n fi n )< y° u can 1°°^ the integral f°° dx f(x) and check it for convergence. 
The series and the integral converge or diverge together. 
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From the graph you see that the function / lies between the tops of the upper and the lower rectangles. The 
area under the curve of / between n and n + 1 lies between the areas of the two rectangles. That's the reason for the 
assumption that / is decreasing and positive. 


f(n ) ' 1 > 


rn + 1 


dx f(x) > f(n + 1) ■ 1 


Add these inequalities from n = k to n = oo and you get 


/(*) + /(* + !) + 


pk-\-l rk+2 poo 

> / + / +•••=/ dx f (x) 

Jk Jk+1 Jk 

poo 

> f(k + 1) + f{k + 2) + • ■ ■ > / dx f{x) > / • 

Jk+l 


(2.9) 


The only difference between the infinite series on the left and on the right is one term, so either everything converges or 
everything diverges. 

You can do better than this and use these inequalities to get a quick estimate of the sum of a series that would 
be too tedious to sum by itself. For example 


E 

i 


rr 


I 1 1 

+ 2 2 + 3 2 + V n 2 

4 


This last sum lies between two integrals. 



( 2 . 10 ) 
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that is, between 1/3 and 1/4. Now I'll estimate the whole sum by adding the first three terms explicitly and taking the 
arithmetic average of these two bounds. 


E 

l 


1 


n z 


1 11/11 
1+ 22 + 32 + 2l3 + 4 


1.653 


( 2 . 11 ) 


The exact sum is more nearly 1.644934066848226, but if you use brute-force addition of the original series to achieve 
accuracy equivalent to this 1.653 estimation you will need to take about 120 terms. This series converges, but not very 
fast. See also problem 2.24. 


Quicker Comparison Test 

There is another way to handle the comparison test that works very easily and quickly (if it's applicable). Look at the 
terms of the series for large n and see what the approximate behavior of the n th term is. That provides a comparison 
series. This is better shown by an example: 


E 

l 


n 3 — 2n + 1/ n 
5 n 5 + sin n 


For large n, the numerator is essentially n 3 and the denominator is essentially 5n 5 , so for large n this series is approximately 
like 

OO ^ 

^ 5 n 2 

More precisely, the ratio of the n th term of this approximate series to that of the first series goes to one asn-> oo. This 
comparison series converges, so the first one does too. If one of the two series diverges, then the other does too. 

Apply the ratio test to the series for e x . 


e x = '^ j x k /k\ so 
o 


u k+ 1 _ x k+1 /(k + 1)! _ x 
u k x k /k\ k + l 


As k oo this quotient approaches zero no matter the value of x. This means that the series converges for all x. 

Absolute Convergence 

If a series has terms of varying signs, that should help the convergence. A series is absolutely convergent if it converges 
when you replace each term by its absolute value. If it’s absolutely convergent then it will certainly be convergent when 
you reinstate the signs. An example of a series that is convergent but not absolutely convergent is 

°o -| 1 1 

^(-l) fc+1 £ = ln(l + 1) = In 2 (2.12) 

k = 1 
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Change all the minus signs to plus and the series is divergent. (Use the integral test.) 

Can you rearrange the terms of an infinite series? Sometimes yes and sometimes no. If a series is convergent but 
not absolutely convergent, then each of the two series, the positive terms and the negative terms, is separately divergent. 
In this case you can rearrange the terms of the series to converge to anything you want! Take the series above that 
converges to In 2. I want to rearrange the terms so that it converges to \/2. Easy. Just start adding the positive terms 
until you’ve passed y/2. Stop and now start adding negative ones until you're below that point. Stop and start adding 
positive terms again. Keep going and you can get to any number you want. 


1 1 

1 + 3 + 5 ~ 


11111 
2 + 7 + 9 + TT + 13 


3 


2.4 Series of Series 

When you have a function whose power series you need, there are sometimes easier ways to the result than a straight- 
forward attack. Not always, but you should look first. If you need the expansion of e ax +bx about the origin you can do 
a lot of derivatives, using the general form of the Taylor expansion. Or you can say 

e ax-+bx _ y _|_ r ax z _|_ jj X \ + l- (ax 2 + bx ) 2 + \(ax 2 + bx ) 3 4 (2.13) 

2 6 

and if you need the individual terms, expand the powers of the binomials and collect like powers of a;: 

1 + bx + (a + b 2 / 2)x 2 + ( ab + b 3 / 6)x 3 4 

If you're willing to settle for an expansion about another point, complete the square in the exponent 

e ax 2 +bx _ g a(x 2 +bx/a) _ e a(x 2 +bx / a+b 2 / 4a 2 )— b 2 / 4a _ ^a(x+b/2a) 2 -b 2 /Aa _ ^a(x+b / 2a) 2 g— & 2 / 4a 

= e -?j2 / 4a [l + a(x + b/2a) 2 + a 2 (x + b/2a) A /2 4 ] 

and this is a power series expansion about the point Xq = —b/2a. 

What is the power series expansion of the secant? You can go back to the general formulation and differentiate a 
lot or you can use a combination of two known series, the cosine and the geometric series. 

1 1 1 

cos a; 1 — ii^ 2 + ir^ 4 H 1 - [^x 2 - ^x 4 -\ ] 

= l +[] + [] 2 +[] 3 + --- 

= l + [±x 2 -j i x A + ..-] + [±x 2 -i i x A + ...] 2 + ... 

= 1 +hx 2 +(~h.+af)x 4 +--- 

= 1 + ^x 2 + ^x 4 4 


(2.14) 
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This is a geometric series, each of whose terms is itself an infinite series. It still beats plugging into the general formula 
for the Taylor series Eq. (2.5). 

What is 1/ sin 3 xl 


sin 3 x [x — x 3 /3! + x b /h\ ) x 3 (l - x 2 /3! + tc 4 /5! ) c 


= 77 = — 5 -(l - 3z + 6 z 2 ) 

X 3( 1 + .) 3 X 3 ' 

= ^3 (1 - 3(— x 2 /3! + x 4 /5! -...) + 6(-x 2 /3! + x 4 /5! - 

1 1 51a; 

— — 5 T “ b _ __ + • • • 


X° 


2x 360 


which is a Frobenius series. 

2.5 Power series, two variables 

The idea of a power series can be extended to more than one variable. One way to develop it is to use exactly the same 
sort of brute-force approach that I used for the one-variable case. Assume that there is some sort of infinite series and 
successively evaluate its terms. 

f(x, y) = A + Bx + Cy + Dx 2 + Exy + Fy 2 + Gx 3 + Hx 2 y + Ixy 2 + Jy 3 ■ ■ ■ 

Include all the possible linear, quadratic, cubic, and higher order combinations. Just as with the single variable, evaluate 
it at the origin, the point (0,0). 

/ (0, 0) = t4. T 0 ~F 0 T ■ ■ ■ 

Now differentiate, but this time you have to do it twice, once with respect to x while y is held constant and once with 
respect to y while x is held constant. 

^J-(x,y) = B + 2Dx + Ey + ■ ■ ■ then ^-(0,0) = B 

ox ox 

= C + Ex + 2Fy then ^(0,0 ) = G 
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Three more partial derivatives of these two equations gives the next terms. 

d 2 f 

7^2 ( x , y) = 2 D + 6 Gx + 2 Hy • • • 
d' 2 f 

-^(x,y) = E + 2Hx + 2ly--- 
d 2 f 

-Q^(x,y) = 2F + 2lx + 6Jy--- 

Evaluate these at the origin and you have the values of D, E , and F. Keep going and you have all the coefficients. 

This is awfully cumbersome, but mostly because the crude notation that I’ve used. You can make it look less 
messy simply by choosing a more compact notation. If you do it neatly it’s no harder to write the series as an expansion 
about any point, not just the origin. 


OO 

fix, y)= A mn{ x - a) m (y - b) n 

m,n=0 


(2.15) 


Differentiate this m times with respect to x and n times with respect to y, then set x 
survives and that is 


Qm+n f 

dx m dy n 


[a, b ) = m\n\A r 


a and y 


b. Only one term 


I can use subscripts to denote differentiation so that ^ is f x and 


d 3 / : s f 

IS Exy- 


Then the two-variable Taylor 


expansion is 


fix, y) = f (0)+f x {0)x + f y (0)y+ 

l [ fxx(0)x 2 + 2f xy (0)xy + f yy (0)y 2 ] + 

^ [fxxx(0)x 3 + 3f xxy (0)x 2 y + 3fx yy (0)xy 2 + f yyy (0)y 3 ] + • • • 

Again put more order into the notation and rewrite the general form using A rnn as 

1 / (m + n)! \ d m+n f 

mn (- m + n)\\ mini ) dx m dy nK , ) 


(2.16) 


(2.17) 
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That factor in parentheses is variously called the binomial coefficient or a combinatorial factor. Standard notations for 
it are 

m! 

n\(m — n)\ 

The binomial series, Eq. (2.4), for the case of a positive integer exponent is 


,c n = 


(2.18) 


( l + x) m 
( a + b) m 



or more symmetrically 


(2.19) 


(a + b ) 2 = a 2 + 2 ab + b 2 , (a + 6) 3 = a 3 + 3 a 2 b + 3 ab 2 + 6 3 , 

(a + b) 4 = a 4 + 4a 3 6 + 6 a 2 b 2 + 4 a& 3 + 6 4 , etc. 

Its relation to combinatorial analysis is that if you ask how many different ways can you choose n objects from a collection 
of m of them, m C n is the answer. 

2.6 Stirling’s Approximation 

The Gamma function for positive integers is a factorial. A clever use of infinite series and Gaussian integrals provides a 
useful approximate value for the factorial of large n. 

n\ ~ v/27 rn n n e~ n for large n (2.20) 


Start from the Gamma function of n + 1. 

/*oo roo 

n\ = T(n + 1) = / dtPe'* = / dte~ t+nlnt 
Jo Jo 

The integrand starts at zero, increases, and drops back down to zero as t — > oo. The graph roughly resembles a 
Gaussian, and I can make this more precise by expanding the exponent around the point where it is a maximum. The 
largest contribution to the whole integral comes from the region near this point. Differentiate the exponent to find the 


maximum: 
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d , x n 

— t + nmt)=— 1 + — = 0 gives t = n 

(JjL l 

Expand about this point 

fit) = — t + nlnt = f(n) + (t — n)f'(ri) + (t — n) 2 f"(n)/ 2 +•• 

= — n + nlnn + 0 + (t — n) 2 (—n/ n 2 ) / 2 + •• 

Keep terms to the second order and the integral is approximately 


n\ ~ 


dt e - n + nlnn -{t- n ) 2 / 2n — n n e~ n / dte~^ t ~ n ' ,2 ^ 2n 


( 2 . 21 ) 


At the lower limit of the integral, at t = 0, this integrand is e n / 2 , so if n is even moderately large then extending the 
range of the integral to the whole line — oo to +oo won’t change the final answer much. 


n 


/•oo 

n e~ n / dt e -M 2 /2n = ri n e~ n \ffnfn 


where the final integral is just the simplest of the Gaussian integrals in Eq. (1.10). 
To see how good this is, try a few numbers 


n 

n! 

Stirling 

ratio 

difference 

1 

1 

0.922 

0.922 

0.078 

2 

2 

1.919 

0.960 

0.081 

5 

120 

118.019 

0.983 

1.981 

10 

3628800 

3598695.619 

0.992 

30104.381 
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You can see that the ratio of the exact to the approximate result is approaching one even though the difference is getting 
very large. This is not a handicap, as there are many circumstances for which this is all you need. This derivation 
assumed that n is large, but notice that the result is not too bad even for modest values. The error is less than 2% 
for n = 5. There are even some applications, especially in statistical mechanics, in which you can make a still cruder 
approximation and drop the factor \/2nn. That is because in that context it is the logarithm of n! that appears, and 
the ratio of the logarithms of the exact and even this cruder approximate number goes to one for large n. Try it. 

Although I've talked about Stirling’s approximation in terms of factorials, it started with the Gamma function, so 
Eq. (2.20) works just as well for T(n + 1) for any real n: 

T(11.34 = 10.34+ 1) = 8116833.918 and Stirling gives 8051701. 

Asymptotic 

You may have noticed the symbol that I used in Eqs. (2.20) and (2.21). doesn't mean “approximately equal to” or 
“about,” because as you see here the difference between n\ and the Stirling approximation grows with n. That the ratio 
goes to one is the important point here and it gets this special symbol, “asymptotic to." 

Probability Distribution 

In section 1.4 the equation (1.17) describes the distribution of the results when you toss a coin. It's straight-forward to 
derive this from Stirling's formula. In fact it is just as easy to do a version of it for which the coin is biased, or more 
generally, for any case that one of the choices is more likely than the other. 

Suppose that the two choices will come up at random with fractions a and b, where a + b = 1. You can still 
picture it as a coin toss, but using a very unfair coin. Perhaps a = 1/3 of the time it comes up tails and b = 2/3 of the 
time it comes up heads. If you toss two coins, the possibilities are 

TT HT TH HH 
and the fractions of the time that you get each pair are respectively 

a 2 ba ab b 2 

This says that the fraction of the time that you get no heads, one head, or two heads are 

a 2 = Yg, 2 ab = 4 /g, b 2 = 4 /g with total (a + b ) 2 = a 2 + 2 ab + b 2 = 1 (2.22) 

Generalize this to the case where you throw N coins at a time and determine how often you expect to see 0, 1, 
. . . , N heads. Equation (2.19) says 


(a + b) N 



a k b N ~ k 


N\ _ N\ 
k ) ~ k\(N - k)\ 


where 
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When you make a trial in which you toss N coins, you expect that the “a” choice will come up N times only the fraction 
a N of the trials. All tails and no heads. Compare problem 2.27. 

The problem is now to use Stirling’s formula to find an approximate result for the terms of this series. This is the 
fraction of the trials in which you turn up k tails and N — k heads. 


kuN-k 


a K b 


N\ 


kuN-k 


k\(N - k)\ 


a K b 


V^NN N e~ N 


= a k b 


kuN-k 


V2nk k k e~ k ^2n(N — k) (N — k) N ~ k e~( N ~ k '> 

1 I N N n 

a/ 27 r V k(N — k) k k (N — k) N ~ k 


(2.23) 


The complicated parts to manipulate are the factors with all the exponentials of k in them. Pull them out from the 
denominator for separate handling, leaving the square roots behind. 

k k (N - k) N - k a~ k b- {N - k) 


The next trick is to take a logarithm and to do all the manipulations on it. 


In — > kink + ( N — k) ln(iV — k) — k In a — (N — k ) In b = f(k ) 


(2.24) 


The original function is a maximum when this denominator is a minimum. When the numbers N and k are big, you can 
treat k as a continuous variable and differentiate with respect to it. Then set this derivative to zero and finally, expand 
in a power series about that point. 


A. 

dk 


f(k) = In k + 1 — ln(iV — k) — 1 — In a + In b = 0 


In 


k 

N~k 



k a 

W^k = V 


k = aN 


This should be no surprise; a is the fraction of the time the first choice occurs, and it says that the most likely number 
of times that it occurs is that fraction times the number of trials. At this point, what is the second derivative? 


when k = aN, 


d 2 


dk 2 

rw = l + 


1 _ 1 1 

N -k ~ aN + N - aN 


1 1 _ 1 
dN + bN~ abN 
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About this point the power series for f(k) is 

f(k) = /(aN) + (k- aN)f(aN) + \(k- aNf/"(aN) + • • • 

= N\vN + ^ N (k-aNf + --- (2.2b) 

To substitute this back into Eq. (2.23), take its exponential. Then because this will be a fairly sharp maximum, only the 
values of k near to aN will be significant. That allows me to use this central value of k in the slowly varying square 
root coefficient of that equation, and I can also neglect higher order terms in the power series expansion there. Let 
5 = k — aN . The result is the Gaussian distribution. 


J_ / N N N = 1 -5 2 ,2abN 

%/27t y aN(N - aN) N N e 5*/2abN ^bNk 


(2.26) 


When a = b = 1/2, this reduces to Eq. (1.17). 

When you accumulate N trials at a time (large N) and then look for the distribution in these cumulative results, 
you will commonly get a Gaussian. This is the central limit theorem, which says that whatever set of probabilities that 
you start with, not just a coin toss, you will get a Gaussian by averaging the data. ( Not really true. There are some 
requirements* on the probabilities that aren't always met, but if as here the variable has a bounded domain then it’s o.k. 
See problems 17.24 and 17.25 for a hint of where a naive assumption that all distributions behave the same way that 
Gaussians do can be misleading.) If you listen to the clicks of a counter that records radioactive decays, they sound (and 
are) random, and the time interval between the clicks varies greatly. If you set the electronics to click at every tenth 
count, the result will sound regular, and the time interval between clicks will vary only slightly. 


2.7 Useful Tricks 

There are a variety of ways to manipulate series, and while some of them are simple they are probably not the sort of 
thing you’d think of until you've seen them once. Example: what is the sum of 


1 - 



1 

7 


• • •? 


Introduce a parameter that you can manipulate, like the parameter you sometimes introduce to do integrals as in Eq. (1.5). 
Consider the series with the parameter x in it. 


<i , , X X X X 

f(x) =x - J + T - T + -- 


(2.27) 


* finite mean and variance 
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Differentiate this with respect to x to get 


f'(x) = 1 - x 2 + x A - x 6 + x 8 


That looks a bit like the geometric series except that it has only even powers and the signs alternate. Is that too great 
an obstacle? As 1/(1 — x) has only plus signs, then change x to —x, and 1/(1 + x) alternates in sign. Instead of x as 
a variable, use x 2 , then you get exactly what you're looking for. 

fix) = 1 - x 2 + x 4 - x 6 + x 8 - ■ • • = — r 

1 + x 2 

Now to get back to the original series, which is /( 1) recall, all that I need to do is integrate this expression for f'(x). 
The lower limit is zero, because /( 0) = 0. 

/( 1 ) = 

This series converges so slowly however that you would never dream of computing 7t this way. If you take 100 terms, 
the next term is 1/201 and you can get a better approximation to 7 r by using 22/7. 

The geometric series is 1 + x + x 2 + x 3 + ■ ■ -, but what if there's an extra factor in front of each term? 

f(x) = 2 + 3x + Ax 2 + 5a: 3 H 


dx 


1 + x 2 


= tan 1 x 


7 r 
4 


Multiply this by x and it is 2x + 3a: 2 + 4a: 3 + 5a: 4 + • • •, starting to look like a derivative. 


d 


x 


fix) = 2x + 3x 2 + 4a; 3 + 5a; 4 H = — (x 2 + x 3 + x A H ) 

dx v 7 


Again, the geometric series pops up, though missing a couple of terms. 

xf(x) = y~(l + x + + x 3 H — 1 — a;) = ^ 


dx 


1 


The final result is then 


m = i 


1 - (1 - x)- 
(1 - x) 2 


1 — X 

2 - x 
(1 - x) 2 


— 1 — X 


(1 -x)‘ 


- 1 
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2.8 Diffraction 

When light passes through a very small opening it will be diffracted so that it will spread out in a characteristic pattern 
of higher and lower intensity. The analysis of the result uses many of the tools that you've looked at in the first two 
chapters, so it's worth showing the derivation first. 

The light that is coming from the left side of the figure has a wavelength A and wave number k = 27 t/A. The 
light passes through a narrow slit of width = a. The Huygens construction for the light that comes through the slit says 
that you can effectively treat each little part of the slit as if it is a source of part of the wave that comes through to 
the right. (As a historical note, the mathematical justification for this procedure didn’t come until about 150 years after 
Huygens proposed it, so if you think it isn’t obvious why it works, you're right.) 



Call the coordinate along the width of the slit y, where 0 < y < a. I want to find the total light wave that 
passes through the slit and that heads at the angle 9 away from straight ahead. The light that passes through between 
coordinates y and y + dy is a wave 

Ady cos (kr — cot) 

Its amplitude is proportional to the amplitude of the incoming wave, A, and to the width dy that I am considering. The 
coordinate along the direction of the wave is r. The total wave that will head in this direction is the sum (integral) over 
all these little pieces of the slit. 

Let ro be the distance measured from the bottom of the slit to where the light is received far away. Find the value 
of r by doing a little trigonometry, getting 

r = tq — y sin 6 

The total wave to be received is now the integral 

f A dy cos (*(r„ - y sin 9) - u,t) = A ™Wr a - V tin6)-ut) 
jq — /c sin U 


0 
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Put in the limits to get 

— j —. — [sin {k{ro — a sin 9) — c ot) — sin (Zero — cot)] 

— rC sin {/ 

I need a trigonometric identity here, one that you can easily derive with the techniques of complex algebra in chapter 3. 


. . x — y \ ( x + y 

sin x — sin y = 2 sin ( — - — | cos 


Use this and the light amplitude is 


2 A 


— fcsin 6 


sm 


ka . ,. 

~Y sm6 


cos 


r 0 — — sin 9) — cot 


(2.28) 


(2.29) 


The wave is the cosine factor. It is a cosine of ( k ■ distance — c ot), and the distance in question is the distance to 
the center of the slit. This is then a wave that appears to be coming from the middle of the slit, but with an amplitude 
that varies strongly with angle. That variation comes from the other factors in Eq. (2.29). 

It’s the variation with angle that's important. The intensity of the wave, the power per area, is proportional to 
the square of the wave's amplitude. I'm going to ignore all the constant factors, so there's no need to worry about the 
constant of proportionality. The intensity is then (up to a factor) 


sin 2 {(ka/ 2) sin#) 
sin 2 9 


(2.30) 


For light, the wavelength is about 400 to 700 nm, and the slit may be a millimeter or a tenth of a millimeter. The size 
of ka/ 2 is then about 

ka/2 = 7ra/A ps 3 ■ 0.1 mm/500 nm« 1000 

When you plot this intensity versus angle, the numerator vanishes when the argument of sin 2 () is me, with n an integer, 
+, — , or 0. This says that the intensity vanishes in these directions except for 9 = 0. In that case the denominator 
vanishes too, so you have to look closer. For the simpler case that 9^0, these angles are 


me 




n = ±1, ±2,... 


Because ka is big, you have many values of n before the approximation that sin# = 9 becomes invalid. You can rewrite 
this in terms of the wavelength because k = 2 ti/\. 


27 ra 
~ 2 \ 


0 , 


me = 


or 


9 = n\/a 
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What happens at zero? Use power series expansions to evaluate this indeterminate form. The first term in the 
series expansion of the sine is 9 itself, so 


j. sin 2 {(ka/2) sin#) (( ka/2)6 ) 2 / ka\ 2 


sin 2 9 9 2 \ 2 J 

What is the behavior of the intensity near 9 = 0? Again, use power series expansions, but keep another term 


(2.31) 


sin 9 = # - -# 3 + 
6 


and 


(1 + x) a = 1 + ax H 


Remember, ka/2 is big! This means that it makes sense to keep just one term of the sine expansion for sin# itself, but 
you'd better keep an extra term in the expansion of the sin 2 (ka. . .). 


sin 2 ((ka/2)9) 1 

= ¥ 


9 2 


ka „\ If ka „\ 3 

T" “6 t"' + 


1 / ka „ 
= ¥ ( 2 


ka V 

~2~J 


i-l C^e) + 

6 V 2 1 


l f ka 


1 - - — 9 
3 V 2 


When you use the binomial expansion, put the binomial in the standard form, (1 + x) as in the second line of these 
equations. What is the shape of this function? Forget all the constants, and it looks like 1 — 9 2 . That's a parabola. 


The dots are the points where the intensity goes to zero, nX/a. Between these directions it reaches a maximum. 
How big is it there ? These maxima are about halfway between the points where (fcasin#)/2 = nil. This is 


sin 9 = (n+ 1 / 2 )i r, 


n = ±1, ±2, . . . 
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At these angles the value of / is, from Eq. (2.30), 



( 1 

\ (2 n + l)7t/2 


2 


The intensity at 9 
this by factors of 


0 is by Eq. (2.31), ( ka/2 ) 2 , so the maxima off to the side have intensities that are smaller than 


1 

97t 2 /4 


0.045, 


1 

25tt 2 /4 


0.016,... 



2.9 Checking Results 

When you solve any problem, or at least think that you've solved it, you're not done. You still have to check to see 
whether your result makes any sense. If you are dealing with a problem whose solution is in the back of the book then do 
you think that the author is infallible? If there is no back of the book and you're working on something that you would 
like to publish, do you think that you're infallible? Either way you can't simply assume that you've made no mistakes; 
you have to look at your answer skeptically. 

There's a second reason, at least as important, to examine your results: that’s where you can learn some physics 
and gain some intuition. Solving a complex problem and getting a complicated answer may involve a lot of mathematics 
but you don't usually gain any physical insight from doing it. When you analyze your results you can gain an understanding 
of how the mathematical symbols are related to physical reality. Often an approximate answer to a complicated problem 
can give you more insight than an exact one, especially if the approximate answer is easier to analyze. 

The first tool that you have to use at every opportunity is dimensional analysis. If you are computing a length 
and your result is a velocity then you are wrong. If you have something in your result that involves adding a time to an 
acceleration or an angle to a distance, then you’ve made a mistake; go back and find it. You can do this sort of analysis 
everywhere, and it is one technique that provides an automatic error finding mechanism. If an equation is dimensionally 
inconsistent, backtrack a few lines and see whether the units are wrong there too. If they are correct then you know that 
your error occurred between those two lines; then further narrow the region where the mistake happened by looking for 
the place at which the dimensions changed from consistent to inconsistent and that’s where the mistake happened. 
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The second tool in your analysis is to examine all the parameters that occur in the result and to see what happens 
when you vary them. Especially see what happens when you push them to an extreme value. This is best explained by 
some examples. Start with some simple mechanics to see the procedure. 

mi a x — ► 

M 


33 


□ 


rti2 


Two masses are attached by a string of negligible mass and that is wrapped around a pulley of mass M so that it 
can’t slip on the pulley. Analyze them to determine what is wrong with each. Assume that there is no friction between 
m\ and the table and that the string does not slip on the pulley. 


m 2 - mi 

(a) a x = Q (b) a x = 

777*2 T 777 1 


777 2 


777 2 + 777 1 - M/2" 


(cj a x = 


777 2 - M/2 
777 2 + 777 1 + M/2" 


(a) If ?77 1 7772, this is negative, meaning that the motion of m\ is being slowed down. But there's no friction 

or other such force to do this. 

OR If ?77 1 = 7772, this is zero, but there are still unbalanced forces causing these masses to accelerate. 

(b) If the combination of masses is just right, for example mi = 1 kg, m 2 = 1 kg, and M = 2 kg, the denominator 
is zero. The expression for a x blows up — a very serious problem. 

OR If M is very large compared to the other masses, the denominator is negative, meaning that a x is negative and the 
acceleration is a braking. Without friction, this is impossible. 

(c) If M 3> 777 1 and 7772, the numerator is mostly —M/2 and the denominator is mostly +M/2. This makes the 
whole expression negative, meaning that mi and m 2 are slowing down. There is no friction to do this, and all the forces 
are the direction to cause acceleration toward positive x. 

OR If 7772 = M/ 2, this equals zero, saying that there is no acceleration, but in this system, a x will always be positive. 


The same picture, but with friction /7k between mi and the table. 


(a) a x — 


777 2 


7772 + /7 k ?77i +M/ 2 


777 2 - /7 k 777i 777 2 

9 (b) a x = — ^—g (c) a x = _ , - 


777 2 - M/2 


m 2 + /7 k 777i - M/2' 


(a) If /7 k is very large, this approaches zero. Large friction should cause mi to brake to a halt quickly with very 
large negative a x . 
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OR If there is no friction, /Xk = 0, then m\ plays no role in this result but if it is big then you know that it will decrease 
the downward acceleration of m 2 . 

(b) The denominator can vanish. If m 2 = M/2 this is nonsense. 

(c) This suffers from both of the difficulties of (a) and (b). 


Trajectory Example 

When you toss an object straight up with an initial speed vq, you may expect an answer for the motion as a function of 
time to be something like 


Vy {t) =v 0 -gt , 


y{t) = vq t 



(2.32) 


Should you expect this? Not if you remember that there’s air resistance. If I claim that the answers are 


Vy(t) = -v t + (vo + v t )e 9t / v \ y(t) = -v t t + (v 0 + f t )^[l - e gt ^ Vt ] (2.33) 


then this claim has to be inspected to see if it makes sense. And I never bothered to tell you what the expression 
“v t " means anyway. You have to figure that out. Fortunately that’s not difficult in this case. What happens to these 
equations for very large time? The exponentials go to zero, so 


-v t + (fo + v t ) ■ 0 = -v t , 


and 


-v t t + (v 0 + Vt ) 


Vt 

9 


v t is the terminal speed. After a long enough time a falling object will reach a speed for which the force by gravity and 
the force by the air will balance each other and the velocity then remains constant. 

Do they satisfy the initial conditions? Yes: 

v y (0) = -v t + (v 0 + v t )e° = v 0 , 2 /( 0 ) = 0 + (f 0 + v t )j ■(! - 1) = 0 


What do these behave like for small time? They ought to reduce to something like the expressions in Eq. (2.32), 
but just as important is to determine what the deviation from that simple form is. Keep some extra terms in the series 
expansion. How many extra terms? If you're not certain, then keep one more than you think you will need. After some 
experience you will usually be able to anticipate what to do. Expand the exponential: 


Vy(t ) = -V t + (V 0 + ft) 


—at 1 
1 + — + 77 
ft 2 


fo 


1 


— fo— 1H g? + - lH 


ft J 


-gt 

ft 

M g 2 t 2 


+ 


ft / ft 
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The coefficient oft says that the object is slowing down more rapidly than it would have without air resistance. So far, 
so good. Is the factor right? Not yet clear, so keep going. Did I need to keep terms to order t 2 ? Probably not, but there 
wasn't much algebra involved in doing it, so it was harmless. 

Look at the other equation, for y. 


y(t) 


-v t t + (v 0 + v t ) 


Vt 

9 



9t 1 
v t 2 



1 

6 


v 0 t 




g 2 t 3 | 

v t 




Now differentiate this approximate expression for y with respect to time and you get the approximate expression for v y . 
That means that everything appears internally consistent, and I haven't introduced any obvious error in the process of 
approximation. 

What if the terminal speed is infinite, so there's no air resistance. The work to answer this is already done. 
Expanding e~ 9t ' Vt for small time is the same as for large v t , so you need only look back at the preceding two sets of 
equations and let v t — > oo. The result is precisely the equations (2.32), just as you should expect. 

You can even determine something about the force that I assumed for the air resistance: F y = ma y = mdv y /dt. 
Differentiate the approximate expression that you already have for v y , then at least for small t 






vo 


= - m \ l + —) 9 + 


+ 1 A V 0 \ g 2 t 2 

gt + - 1 + — - — + 

2 I Vt) v t 


= —mg - mgvo/vt + 


(2.34) 


This says that the force appears to be (1) gravity plus (2) a force proportional to the initial velocity. The last fact comes 
from the factor Vq in the second term of the force equation, and at time zero, that is the velocity. Does this imply that I 
assumed a force acting as F y = —mg — (a constant times)vy? To this approximation that's the best guess. (It happens 
to be correct.) To verify it though, you would have to go back to the original un-approximated equations (2.33) and 
compute the force from them. 


^.ci 


< >b 






c 


t 
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Electrostatics Example 

Still another example, but from electrostatics this time: Two thin circular rings have radii a and b and carry charges Q i 
and Q 2 distributed uniformly around them. The rings are positioned in two parallel planes a distance c apart and with 
axes coinciding. The problem is to compute the force of one ring on the other, and for the single non-zero component 
the answer is (perhaps) 


Q 1 Q 2 C W 2 de 

2vr 2 eo Jo c 2 + (b — a) 2 + 4a6sin 2 9 3 / 2 


(2.35) 


Is this plausible? First check the dimensions! The integrand is (dimensionally) l/(c 2 ) 3 / 2 = l/c 3 , where c is one of the 
lengths. Combine this with the factors in front of the integral and one of the lengths (c’s) cancels, leaving Q1Q2/ eoC 2 . 
This is (again dimensionally) the same as Coulomb’s law, (M2/47reo'r 2 , so it passes this test. 

When you've done the dimensional check, start to consider the parameters that control the result. The numbers 
a, b, and c can be anything: small, large, or equal in any combination. For some cases you should be able to say what 
the answer will be, either approximately or exactly, and then check whether this complicated expression agrees with your 
expectation. 

If the rings shrink to zero radius this has a = b = 0, so F z reduces to 


Q 1 Q 2 C f n/2 ,n]_ _ Q 1 Q 2 C 7t _ Q 1 Q 2 
27t 2 eo Jo c 3 27t 2 eo 2c 3 47reoC 2 


and this is the correct expression for two point charges a distance c apart. 

If c>a and b then this is really not very different from the preceding case, where a and b are zero. 
If a = 0 this is 


Q 1 Q 2 C W 2 d6 _ Q 1 Q 2 C 7t/2 _ Q 1 Q 2 C 

2vr 2 eo Jo c 2 H- 6 2 ] 3 / 2 ^ 71 " 2e o c 2 + 6 2 ] 3//2 47teo [c 2 + b 2 ] 3//2 


The electric field on the axis of a ring is something that you can compute easily. The only component of the electric 
field at a point on the axis is itself along the axis. You can prove this by assuming that it’s false. Suppose that there’s a 
lateral component of E and say that it's to the right. Rotate everything by 180° about the axis and this component of 
E will now be pointing in the opposite direction. The ring of charge has not changed however, so E must be pointing in 
the original direction. This supposed sideways component is equal to minus itself, and something that's equal to minus 
itself is zero. 



2 — Infinite Series 


55 


All the contributions to E except those parallel the axis add to zero. Along the axis each piece of charge dq 
contributes the component 



dq c 

47re 0 [c 2 + b 2 ] y/ c 2 + b 2 


The first factor is the magnitude of the field of the point charge at a distance r = \/c 2 + b 2 and the last factor is the 
cosine of the angle between the axis and r. Add all the dq together and you get Q\. Multiply that by Q 2 and you have 
the force on Q 2 and it agrees with the expressions Eq. (2.36) 

If c — > 0 then F z — » 0 in Eq. (2.35). The rings are concentric and the outer ring doesn't push the inner ring either 
up or down. 

But wait. In this case, where c — 0, what if a = bl Then the force should approach infinity instead of zero 
because the two rings are being pushed into each other. If a = b then 


Q1Q2C r > 2 de 

27r 2 eo Jo [c 2 + 4a 2 sin 2 9 ] 3/2 


(2.37) 


If you simply set c = 0 in this equation you get 


Q1Q20 r ' 2 de 

27T 2 eo J 0 [4 a 2 gi n 2 3 / 2 


The numerator is zero, but look at the integral. The variable 6 goes from 0 to 7t/2, and at the end near zero the 
integrand looks like 

1 i _ i 

[4a 2 sin 2 0 ] 3/2 ~ [4a 2 6* 2 ] 3/2 ” 8fl303 

Here I used the first term in the power series expansion of the sine. The integral near the zero end is then approximately 


d9_ 

w 


-1 

w 


0 


0 
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and that’s infinite. This way to evaluate F z is indeterminate: 0 ■ oo can be anything. It doesn't show that this F z gives 
the right answer, but it doesn’t show that it's wrong either. 

Estimating a tough integral 

Although this is more difficult, even tricky, I'm going to show you how to examine this case for small values of c and not 
for c = 0. The problem is in figuring out how to estimate the integral (2.37) for small c, and the key is to realize that 
the only place the integrand gets big is in the neighborhood of 9 = 0. The trick then is to divide the range of integration 
into two pieces 


f7r/2 


dd 


f7r/2 


+ 


' o [c 2 + 4a 2 sin 2 0] 3//2 Jo J a 


For any positive value of A the second piece of the integral will remain finite even as c — > 0. This means that in trying 
to estimate the way that the whole integral approaches infinity I can ignore the second part of the integral. Now choose 
A small enough that for 0 < 6 < A I can use the approximation sin 9 = 9, the first term in the series for sine. (Perhaps 
A = 0.1 or so.) 


for small c, 


f7r/ 2 


de 


de 


'o [c 2 + 4a 2 sin 2 0] 3/2 J o [c 2 + 4a 2 # 2 ] 3/2 

This is an elementary integral. Let 6 = (c/2a) tan</>. 


+ lower order terms 


de 


f 


(c/2a) sec 2 (j>d<f> 


c A' 


o [ c 2 + 4 a 2 # 2 j 3 / 2 Jo [c 2 + c 2 tan 2 0] 3 / 2 2 ac 2 J 0 


cos 0 = 


2ac 2 


sin A' 


The limit A' comes from A = (c/2a)tanA', so this implies tan A' = 2aA/c. Now given the tangent of an angle, I want 
the sine — that’s the first page of chapter one. 


sin A' = 


2a A/ c 


2aA 


y / l + (2aA/c) 2 y/c 2 + 4a 2 A 2 


As c — > 0, this approaches one. Put all of this together and you have the behavior of the integral in Eq. (2.37) for smal 

c. 

r /2 de i 

+ lower order terms 


' o [c 2 + 4a 2 sin 2 6 ] 3//2 
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Insert this into Eq. (2.37) to get 

p Q1Q2C 1 _ Q1Q2 

z 27t 2 eo 2 ac 2 47r 2 eoac 

Now why should I believe this any more than I believed the original integral? When you are very close to one of 
the rings, it will look like a long, straight line charge and the linear charge density on it is then A = Qi/27ta. What is 
the electric field of an infinitely long uniform line charge? E r = A/27teo r. So now at the distance c from this line charge 
you know the ^Afield and to get the force on Q 2 you simply multiply this field by Q 2 . 

should be -L-Q, = (2.38) 

and that's exactly what I found in the preceding equation. After all these checks I think that I may believe the result, and 
more than that you begin to get an intuitive idea of what the result ought to look like. That’s at least as valuable. It’s 
what makes the difference between understanding the physics underlying a subject and simply learning how to manipulate 
the mathematics. 


Exercises 

1 Evaluate by hand cos 0.1 to four places. 

2 In the same way, evaluate tanO.l to four places. 

3 Use the first two terms of the binomial expansion to estimate \pl = \Jl + 1. What is the relative error? 
[(wrong— right) / right] 

4 Same as the preceding exercise, but for \/l.2. 

5 What is the domain of convergence for x — x 4 + x 9 — x 4 + x 5 — ■ ■ ■ 

OO 

6 Does ^ cos(?t) — cos(?z + 1) converge? 

n = 0 
00 1 

7 Does — — converge? 

^ Jn 
n= 1 
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OO . 

r — > 72! 

8 Does > -^converge? 

' n z 

n = 1 

9 S 4 

»7/ X X 

9 What is the domain of convergence for ~ H 7 + ■ ■ •? 

6 1-2 2 ■ 2 2 3 ■ 3 3 4 ■ 4 4 

10 From Eq. (2.1), find a series for ttt. 

(1-x) 2 

11 If x is positive, sum the series 1 + e~ x + e~ 2x + e~ 3x + • • • 

12 What is the ratio of the exact value of 20! to Stirling's approximation for it? 

13 For the example in Eq. (2.22), what are the approximate values that would be predicted from Eq. (2.26)? 

14 Do the algebra to evaluate Eq. (2.25). 

15 Translate this into a question about infinite series and evaluate the two repeating decimal numbers: 0.444444..., 
0.987987987. . . 

16 What does the integral test tell you about the convergence of the infinite series n~ p l 

17 What would the power series expansion for the sine look like if you require it to be valid in arbitrary units, not just 
radians? This requires using the constant “C" as in section 1.1. 
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Problems 

2.1 (a) If you borrow $200,000 to buy a house and will pay it back in monthly installments over 30 years at an annual 
interest rate of 6%, what is your monthly payment and what is the total money that you have paid (neglecting inflation)? 
To start, you have N payments p with monthly interest i and after all N payments your unpaid balance must reach 
zero. The initial loan is L and you pay at the end of each month. 

((L(l + i) -p)(l +i) — p)(l + i) — p ••• A? times =0 

Now carry on and find the general expression for the monthly payment. Also find the total paid. 

(b) Does your general result for arbitrary N reduce to the correct value if you pay everything back at the end of one 
month? [L( 1 + i) = p] 

(c) For general N , what does your result become if the interest rate is zero? Ans: $1199.10, $431676 

2.2 In the preceding problem, suppose that there is an annual inflation of 2%. Now what is the total amount of money 
that you've paid in constant dollars? That is, one hundred dollars in the year 2010 is worth just $100/l.02 10 = $82.03 
as expressed in year-2000 dollars. Each payment is paid with dollars of gradually decreasing value. Ans: $324211 

2.3 Derive all the power series that you're supposed to memorize, Eq. (2.4). 

2.4 Sketch graphs of the functions 

e -x 2 xe~ x2 x 2 e~ x2 e~ 1*1 xe~ 1*1 x 2 e ~ ^ e -1 /* e -1 /* 2 


2.5 The sample series in Eq. (2.7) has a simple graph ( x 2 between — L and +L) Sketch graphs of one, two, three terms 
of this series to see if the graph is headed toward the supposed answer. 

2.6 Evaluate this same Fourier series for x 2 at x = L; the answer is supposed to be L 2 . Rearrange the result from the 
series and show that you can use it to evaluate £(2), Eq. (2.6). Ans: 7r 2 / 6 

2.7 Determine the domain of convergence for all the series in Eq. (2.4). 

2.8 Determine the Taylor series for coshx and sinhx. 

2.9 Working strictly by hand, evaluate \/0.999. Also \/50. Ans: Here's where a calculator can tell you better than I can. 
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2.10 Determine the next, x 6 , term in the series expansion of the secant. Ans: 61x 6 /720 

2.11 The power series for the tangent is not as neat and simple as for the sine and cosine. You can derive it by taking 
successive derivatives as done in the text or you can use your knowledge of the series for the sine and cosine, and the 
geometric series. 


sinx x — at 3 / 3! + • • • r q, , n r . 9 . , 

tana; = = = [x - x 3 3\ H 1 + {-x 2 2\ H ) 

Use the expansion for the geometric series to place all the x 2 , x 4 , etc. terms into the numerator, treating every term 
after the “1” as a single small thing. Then collect the like powers to obtain the series at least through x 5 . 

Ans: x + x 3 /3 + 2x 5 /l5 + 17a; 7 /315 + • • • 

2.12 What is the series expansion for csca; = 1/ sina;? As in the previous problem, use your knowledge of the sine 
series and the geometric series to get this result at least through x 5 . Note: the first term in this series is l/x. 
Ans: l/x + x/6 + 7a; 3 /360 + 31z 5 / 15120 + . . . 

2.13 The exact relativistic expression for the kinetic energy of an object with non-zero mass is 

K = mc 2 (y 7 — l) where 7 = (l — v 2 /c 2 )~ 1 ^ 2 

and c is the speed of light in vacuum. If the speed v is small compared to the speed of light, find an approximate 
expression for K to show that it reduces to the Newtonian expression for the kinetic energy, but include the next term 
in the expansion to determine how large the speed v must be in order that this correction term is 10% of the Newtonian 
expression for the kinetic energy? Ans: v « 0.36 c 

2.14 Use series expansions to evaluate 


1 — cos x . sm kx 

inn and Iim 

x—to 1 — cosh x x — 10 x 


2.15 Evaluate using series; you will need both the sine series and the binomial series. 


1 


lim 

£->0 V sin z x 


1 

x 2 
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Now do it again, setting up the algebra differently and finding an easier (or harder) way. Ans: 1/3 

2.16 For some more practice with series, evaluate 


lim - H . 

*->0 \x 1 — vl + x 

Ans: Check experimentally with a few values of x on a pocket calculator. 

2.17 Expand the integrand to find the power series expansion for 


ln(l + x ) 



dt(l + t) 1 


Ans: Eq. (2.4) 

2.18 (a) The error function erf(:r) is defined by an integral. Expand the integrand, integrate term by term, and develop 
a power series representation for erf. For what values of x does it converge? Evaluate erf(l) from this series and 
compare it to the result of problem 1.34. (b) Also, as further validation of the integral in problem 1.13, do the power 
series expansion of both sides of the equation and verify the expansions of the two sides of the equation agree . 

2.19 Verify that the combinatorial factor m C n is really what results for the coefficients when you specialize the binomial 
series Eq. (2.4) to the case that the exponent is an integer. 

2.20 Determine the double power series representation about (0, 0) of l/ [( 1 — x/a){ 1 — y/b)] 

2.21 Determine the double power series representation about (0, 0) of l/(l — x/a — y/b) 

2.22 Use a pocket calculator that can handle 100! and find the ratio of Stirling's approximation to the exact value. You 
may not be able to find the difference of two such large numbers. An improvement on the basic Stirling's formula is 

V 27 m n n e~ n (\-\ — — ^ 


What is the ratio of approximate to exact for n = 1, 2, 10? 
Ans: 0.99898, 0.99948, . . . 
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2.23 Evaluate the sum l/n(n + 1). To do this, write the single term 1 /n(n + 1) as a combination of two fractions 
with denominator n and (n + 1) respectively, then start to write out the stated infinite series to a few terms to see the 
pattern. When you do this you may be tempted to separate it into two series, of positive and of negative terms. Examine 
the problem of convergence and explain why this is wrong. Ans: 1 

2.24 (a) You can sometimes use the result of the previous problem to improve the convergence of a slow-converging 
series. The sum l/n 2 converges, but not very fast. If you add zero to it you don't change the answer, but if you're 
clever about how you add it you can change this into a much faster converging series. Add 1 — Y/T 1 /n(n + 1) to this 
series and combine the sums, (b) After Eq. (2.11) it says that it takes 120 terms to get the stated accuracy. Verify this. 
For the same accuracy, how many terms does this improved sum take? Ans: about 8 terms 

2.25 The electric potential from one point charge is kq/r. For two point charges, you add the potentials of each: 
kqi/ri + kq 2 /r 2 - Place a charge —q at the origin; place a charge +q at position (. x,y,z ) = (0,0, a). Write the total 
potential from these at an arbitrary position P with coordinates ( x,y,z ). Now suppose that a is small compared to the 
distance of P to the origin (r = -J x 1 + y 2 + z 2 ) and expand your result to the first non-vanishing power of a, or really 
of a/r. This is the potential of an electric dipole. Also express your answer in spherical coordinates. See section 8.8 if 
you need. Ans: kqa cos 0/r 2 

2.26 Do the previous problem, but with charge —2 q at the origin and charges +q at each of the two points (0, 0, a) 
and (0,0, —a). Again, you are looking for the potential at a point far away from the charges, and up to the lowest 
non-vanishing power of a. In effect you're doing a series expansion in a/r and keeping the first surviving term. Also 
express the result in spherical coordinates. The angular dependence should be proportional to P2(cos(9) = | cos 2 6 — 

a “Legendre polynomial." The r dependence will have a l/r 3 in it. This potential is that of a linear quadrupole. 

2.27 The combinatorial factor Eq. (2.18) is supposed to be the number of different ways of choosing n objects out of 
a set of m objects. Explicitly verify that this gives the correct number of ways for m = 1, 2, 3, 4. and all n from zero 
to m. 

2.28 Pascal's triangle is a visual way to compute the values of m C n . Start with the single digit 1 on the top line. Every 

new line is computed by adding the two neighboring digits on the line above. (At the end of the line, treat the empty 
space as a zero.) 1 


1 1 
1 2 1 
13 3 1 
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Write the next couple of lines of the triangle and then prove that this algorithm works, that is that the m th row is the 
m C n , where the top row has m = 0. Mathematical induction is the technique that I recommend. 


2.29 S um the series and show 


12 3 

2! + 3! + 4! + " ' “ 1 


2.30 You know the power series representation for the exponential function, but now apply it in a slightly different 
context. Write out the power series for the exponential, but with an argument that is a differential operator. The letter 
h represents some fixed number; interpret the square of d/dx as d 2 / dx 2 and find 

e h ^f(x) 

Interpret the terms of the series and show that the value of this is f(x + h). 

2.31 The Doppler effect for sound with a moving source and for a moving observer have different formulas. The Doppler 
effect for light, including relativistic effects is different still. Show that for low speeds they are all about the same. 


f = f 


v - Vo 


r = f 


V 


v + v s 


f = f 1 


1 — v/c 
1 + v/c 


The symbols have various meanings: v is the speed of sound in the first two, with the other terms being the velocity 
of the observer and the velocity of the source. In the third equation c is the speed of light and v is the velocity of the 
observer. And no, 1 = 1 isn’t good enough; you should get these at least to first order in the speed. 

2.32 In the equation (2.30) for the light diffracted through a narrow slit, the width of the central maximum is dictated 
by the angle at the first dark region. How does this angle vary as you vary the width of the slit, a? What is this angle if 
a = 0.1mm and A = 700 nm? And how wide will the central peak be on a wall 5 meters from the slit? Take this width 
to be the distance between the first dark regions on either side of the center. 

2.33 An object is a distance d below the surface of a medium with index of refraction n. (For example, water.) When 
viewed from directly above the object in air (i.e. use small angle approximation), the object appears to be a distance 
below the surface given by (maybe) one of the following expressions. Show why most of these expressions are implausible; 
that is, give reasons for eliminating the wrong ones without solving the problem explicitly. 

(1) d\J 1 + n 2 /n (2) dn/V 1 + n 2 


(3) nd (4) d/n (5) dn 2 / y/l + n 2 
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2.34 A mass mi hangs from a string that is wrapped around a pulley of mass M . As the mass rn \ falls with acceleration 
a y , the pulley rotates. An anonymous source claims that the acceleration of mi is one of the following answers. Examine 
them to determine if any is plausible. That is, examine each and show why it could not be correct. NOTE: solving the 
problem and then seeing if any of these agree is not what this is about. 

(1) a y = Mg /(mi — M) (2) a y = Mg/ (mi + M) (3) a y = mig/M 


2.35 Light travels from a point on the left ( p ) to a point on the right (q), and 
on the left it is in vacuum while on the right of the spherical surface it is in glass 
with an index of refraction n. The radius of the spherical surface is R and you can 
parametrize the point on the surface by the angle 8 from the center of the sphere. 

Compute the time it takes light to travel on the indicated path (two straight line 
segments) as a function of the angle 6. Expand the time through second order in 
a power series in 9 and show that the function T(9) has a minimum if the distance 
q is small enough, but that it switches to a maximum when q exceeds a particular 
value. This position is the focus. 

2.36 Combine two other series to get the power series in 6 for ln(cos0). 

2.37 Subtract the series for ln(l — x) and ln(l + x). For what range of x does this series converge? For what range of 
arguments of the logarithm does it converge? 

Ans: — 1 < x < 1, 0 < arg < oo 



2.38 A function is defined by the integral 




Expand the integrand with the binomial expansion and derive the power (Taylor) series representation for / about x = 0. 
Also make a hyperbolic substitution to evaluate it in closed form. 
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2.39 Light travels from a point on the right ( p ), hits a spherically shaped mirror 
and goes to a point ( q ). The radius of the spherical surface is R and you can 
parametrize the point on the surface by the angle 9 from the center of the sphere. 

Compute the time it takes light to travel on the indicated path (two straight line 
segments) as a function of the angle 9. 

Expand the time through second order in a power series in 9 and show that the 
function T(9) has a minimum if the distance q is small enough, but that it switches 
to a maximum when q exceeds a particular value. This is the focus. 

2.40 (a) The quadratic equation ax 2 +bx+c = 0 is almost a linear equation if a is small enough: bx+c = 0 =4> x = —c/b. 
You can get a more accurate solution iteratively by rewriting the equation as 

C CL 2 

b b 

Solve this by neglecting the second term, then with this approximate X\ get an improved value of the root by 



c 


X2 = - 


b 



and you can repeat the process. For comparison take the exact solution and do a power series expansion on it for small 
a. See if the results agree. 

(b) Where does the other root come from? That value of x is very large, so the first two terms in the quadratic are the 
big ones and must nearly cancel, ax 2 + bx = 0 so x = —b/a. Rearrange the equation so that you can iterate it, and 
compare the iterated solution to the series expansion of the exact solution. 


b c 
a ax 


Solve 0.001a: 2 + x + 1 = 0. Ans: Solve it exactly and compare. 


2.41 Evaluate the limits 


(a) lim 


sin x — tan x 


(b) lim 

x^0 


sin x — tan x 
^2 ’ 


( c ) 


, sm x — tan x 
lim 5 

£-S>0 X 6 


X 



2 — Infinite Series 


66 


Ans: Check with a pocket calculator for x = 1.0, 0.1, 0.01 

2.42 Fill in the missing steps in the derivation of Eq. (2.26). 

2.43 Is the result in Eq. (2.26) normalized properly? What is its integral d5 over all 51 Ans: 1 

2.44 A political survey asks 1500 people randomly selected from the entire country whom they will vote for as dog- 
catcher-in-chief. The results are 49.0% for T.l. Hulk and 51.0% for T.A. Spiderman. Assume that these numbers are 
representative, an unbiased sample of the electorate. The number 0.49 x 1500 = aN is now your best estimate for the 
number of votes Mr. Hulk will get in a sample of 1500. Given this estimate, what is the probability that Mr. Hulk will 
win the final vote anyway? (a) Use Eq. (2.26) to represent this estimate of the probability of his getting various possible 
outcomes, where the center of the distribution is at k = aN. Using 5 = k — aN, this probability function is proportional 
to exp ( — 5 2 /2abN), and the probability of winning is the sum of all the probabilities of having k > N/2, that is, 
In / 2 dk- (b) What would the answer be if the survey had asked 150 or 15000 people with the same 49-51 results? 

Ans: (a) \ [l - erf (y/ N/2ab (\ - a))] . 22%, (b) 40%, 0.7% 

2.45 For the function defined in problem 2.38, what is its behavior near x = 1? Compare this result to equation (1.4). 
Note: the integral is J Q A + // . Also, 1 — t 2 = (1 + t)( 1 — t), and this « 2(1 — t) near 1. 

2.46 (a) What is the expansion of 1/(1 + t 2 ) in powers of? for small t. (b) That was easy, now what is it for large tl 
In each case, what is the domain of convergence? 

2.47 The “average” of two numbers a and b commonly means (a + b)/ 2, the arithmetic mean. There are many other 
averages however. ( a,b> 0) 

M n (a,b)=[(a n + b n )/ 2] 1/n 

is the n tb mean, also called the power mean, and it includes many others as special cases, n = 2: root-mean-square, 
n = —1: harmonic mean. Show that this includes the geometric mean too: \fab = lim n _>o M n (a,b). It can be shown 
that dM n /dn > 0; what inequalities does this imply for various means? Ans: harmonic < geometric < arithmetic < 
rms 


2.48 Using the definition in the preceding problem, show that dM n /dn > 0. [Tough!] 


2.49 In problem 2.18 you found the power series expansion for the error function — good for small arguments. Now 
what about large arguments? 


erf(x) 


2 

7 ^ 



die 


- 1 2 





dtj-te 1 

L 


2 


2 — Infinite Series 


67 


Notice that you can integrate the te~ t2 factor explicitly, so integrate by parts. Then do it again and again. This 
provides a series in inverse powers that allows you evaluate the error function for large arguments. What is erf(3)? 
Ans: 0.9999779095 See Abramowitz and Stegun: 7.1.23. 

2.50 A friend of mine got a different result for Eq. (2.35). Instead of sin 2 6 in the denominator, he got a sin0. Analyze 
his answer for plausibility. 

2.51 Find the minimum of the function f(r) = ar + b/r for a, b, r > 0. Then find the series expansion of / about 
that point, at least as far as the first non-constant term. 

2.52 In problem 2.15 you found the limit of a function as x — > 0. Now find the behavior of the same function as a series 
expansion for small x, through terms in x 2 . Ans: ^ + j^x 2 . To test whether this answer or yours or neither is likely to 
be correct, evaluate the exact and approximate values of this for moderately small i on a pocket calculator. 

2.53 Following Eq. (2.34) the tentative conclusion was that the force assumed for the air resistance was a constant 
times the velocity. Go back to the exact equations (2.33) and compute this force without approximation, showing that 
it is in fact a constant times the velocity. And of course find the constant. 

2.54 An object is thrown straight up with speed vq. There is air resistance and the resulting equation for the velocity 
is claimed to be (only while it's going up) 


Vy{t) 


= Vt 


Vo — Ik t&n(gt/v t ) 
v t + Vq tan (gt/v t ) 


where v t is the terminal speed of the object after it turns around and has then been falling long enough, (a) Check 
whether this equation is plausible by determining if it reduces to the correct result if there is no air resistance and the 
terminal speed goes to infinity, (b) Now, what is the velocity for small time and then use F y = ma y to infer the probable 
speed dependence of what I assumed for the air resistance in deriving this expression. See problem 2.11 for the tangent 
series, (c) Use the exact v y (t) to show that no matter how large the initial speed is, it will stop in no more than some 
maximum time. For a bullet that has a terminal speed of lOOm/s, this is about 16 s. 

2.55 Under the same circumstances as problem 2.54, the equation for position versus time is 


y<t) = J in(- 






Vt 
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(a) What is the behavior of this for small time? Analyze and interpret what it says and whether it behaves as it should. 

(b) At the time that it reaches its maximum height (v y = 0), what is its position? Note that you don’t need to have an 
explicit value of t for which this happens; you use the equation that t satisfies. 

2.56 You can get the individual terms in the series Eq. (2.13) another way: multiply the two series: 

^ax 2 +bx ^ax 2 ^bx 

Do so and compare it to the few terms found after (2.13). 


Complex Algeb ra 

When the idea of negative numbers was broached a couple of thousand years ago, they were considered suspect, in 
some sense not “real.” Later, when probably one of the students of Pythagoras discovered that numbers such as y/2 are 
irrational and cannot be written as a quotient of integers, legends have it that the discoverer suffered dire consequences. 
Now both negatives and irrationals are taken for granted as ordinary numbers of no special consequence. Why should 
\/— T be any different? Yet it was not until the middle 1800’s that complex numbers were accepted as fully legitimate. 
Even then, it took the prestige of Gauss to persuade some. How can this be, because the general solution of a quadratic 
equation had been known for a long time? When it gave complex roots, the response was that those are meaningless 
and you can discard them. 

3.1 Complex Numbers 

As soon as you learn to solve a quadratic equation, you are confronted with complex numbers, but what is a complex 
number? If the answer involves \f—l then an appropriate response might be "What is that?" Yes, we can manipulate 
objects such as — l + 2i and get consistent results with them. We just have to follow certain rules, such as i 2 = — 1. But 
is that an answer to the question? You can go through the entire subject of complex algebra and even complex calculus 
without learning a better answer, but it's nice to have a more complete answer once, if then only to relax* and forget it. 

An answer to this question is to define complex numbers as pairs of real numbers, (a, b). These pairs are made 
subject to rules of addition and multiplication: 

(a, h) + (c, d) = (a + c, b + d) and (a,b)(c,d) = (ac — bd,ad + be) 

An algebraic system has to have something called zero, so that it plus any number leaves that number alone. Here that 
role is taken by (0, 0) 

(0, 0) + (a, b) = (a + 0, b + 0) = (a, b ) for all values of (a, b) 

What is the identity, the number such that it times any number leaves that number alone? 

(1, 0)(c, d) = (1 ■ c — 0 ■ d, 1 ■ d + 0 ■ c) = (c,d) 


* If you think that this question is an easy one, you can read about some of the difficulties that the greatest 
mathematicians in history had with it: “An Imaginary Tale: The Story of y/—l" by Paul J. Nahin. I recommend it. 
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so (1,0) has this role. Finally, where does \/~T fit inf" 

( 0 , 1 )( 0 , 1 ) = (0 ■ 0 - 1 ■ 1, 0 ■ 1 + 1 ■ 0 ) = (- 1 , 0 ) 

and the sum (—1, 0) + (1, 0) = (0, 0) so (0, 1) is the representation of i = y/—l, that is i 2 + 1 = 0. [(0, l) 2 + (1, 0) = 

( 0 , 0 )]. 

This set of pairs of real numbers satisfies all the desired properties that you want for complex numbers, so having 
shown that it is possible to express complex numbers in a precise way, I'll feel free to ignore this more cumbersome 
notation and to use the more conventional representation with the symbol v. 

(a, b ) < — > a + ib 

That complex number will in turn usually be represented by a single letter, such as z 
The graphical interpretation of complex numbers is the Cartesian geometry of 
the plane. The x and y in z = x+iy indicate a point in the plane, and the operations 
of addition and multiplication can be interpreted as operations in the plane. Addition 
of complex numbers is simple to interpret; it’s nothing more than common vector 
addition where you think of the point as being a vector from the origin. It reproduces 
the parallelogram law of vector addition. 

The magnitude of a complex number is defined in the same way that you 
define the magnitude of a vector in the plane. It is the distance to the origin using 
the Euclidean idea of distance. 

\z\ = \x + iy\ = \/x 2 + y 2 

The multiplication of complex numbers doesn't have such a familiar interpretation in the language of vectors. 
(And why should it?) 

3.2 Some Functions 

For the algebra of complex numbers I'll start with some simple looking questions of the sort that you know how to handle 
with real numbers. If z is a complex number, what are z 2 and y/zl Use x and y for real numbers here. 

z = x + iy, so z 2 = (x + iy ) 2 = x 2 — y 2 + 2 ixy 

That was easy, what about the square root? A little more work: 


= x + iy. 



(3.1) 
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If z = x + iy and the unknown is w = u + iv ( u and v real) then 

x + iy = u 2 — v 2 + 2iuv, so x = u 2 — v 2 and y = 2uv 

These are two equations for the two unknowns u and v, and the problem is now to solve them. 

V 9 y 2 a 9 y 2 

v = — , so x = r- — r, or u —xu - = 0 

2 u 4w 2 4 

This is a quadratic equation for u 2 . 


9 x 

u = - 


± \Jx 2 + y 2 


then 


u = ± 1 


lx ± \Jx 2 + y 2 


(3.2) 


Use v = y/2u and you have four roots with the four possible combinations of plus and minus signs. You’re supposed to 
get only two square roots, so something isn’t right yet; which of these four have to be thrown out? See problem 3.2. 

What is the reciprocal of a complex number? You can treat it the same way as you did the square root: solve for 
it. 

(x + iy)(u + iv) = 1, so xu — yv = 1, xv + yu = 0 

Solve the two equations for u and v. The result is 


1 x — iy 
z x 2 + y 2 


(3.3) 


See problem 3.3. At least it's obvious that the dimensions are correct even before you verify the algebra. In both of 
these cases, the square root and the reciprocal, there is another way to do it, a much simpler way. That’s the subject of 
the next section. 

Complex Exponentials 

A function that is central to the analysis of differential equations and to untold other mathematical ideas: the exponential, 
the familiar e x . What is this function for complex values of the exponent? 


e z = e x+iy = e x e iy 


(3.4) 
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This means that all that’s necessary is to work out the value for the purely imaginary exponent, and the general case is 
then just a product. There are several ways to work this out, and I'll pick what is probably the simplest. Use the series 
expansions Eq. (2.4) for the exponential, the sine, and the cosine and apply it to this function. 


w • (iy ) 2 (w ) 3 (w ) 4 

e iy = 1 + iy + + + v 


2! 3! 

y 2 y 4 

= 1 “!r + 4! ■■" + ! 


4! 

y 3 v 5 
y 3! 5! 


cos y + i sin y 


(3.5) 


A few special cases of this are worth noting: e in / 2 = i, also e ln = —1 and e 2l7r = 1. In fact, e 2mrz = 1 so the 
exponential is a periodic function in the imaginary direction. 

The magnitude or absolute value of a complex number z = x + iy \s r = \Jx 2 + y 2 . Combine this with the 
complex exponential and you have another way to represent complex numbers. 



z = x + iy = r cos 9 + ir sin 6 


r (cos 6 + i sin 9) = re 10 


(3.6) 


This is the polar form of a complex number and x + iy is the rectangular form of the same number. The magnitude is 

\z\ = r = \J x 2 + y 2 . What is Vt' ? Express it in polar form: (e i7r / 2 ) 1//2 , or better, (e*( 2mr+7r / 2 )) 1 / 2 . This is 


gi(n7r+7r/4) = ( e «r)n gttr/4 = ±( COS 7t/4 + i sin 7t/4) = 


7t/2 

i 7t/4 

'.y 


3 — Complex Algebra 


73 


3.3 Applications of Euler's Formula 

When you are adding or subtracting complex numbers, the rectangular form is more convenient, but when you're 
multiplying or taking powers the polar form has advantages. 

Z 1 Z 2 = rie lGl r2e 102 = r±r 2 e l ^ 1+ ^ (3.7) 

Putting it into words, you multiply the magnitudes and add the angles in polar form. 

From this you can immediately deduce some of the common trigonometric identities. Use Euler’s formula in the 
preceding equation and write out the two sides. 

ri(cos#i + i sin )r 2 (cos 02 + * sin 02 ) = rir 2 [cos(0i + O 2 ) + i sin(0i + 02 )] 

The factors r\ and r 2 cancel. Now multiply the two binomials on the left and match the real and the imaginary parts to 
the corresponding terms on the right. The result is the pair of equations 

cos(0i + 02) = cos 0i cos 02 — sin 0i sin 02 and sin(0i + 02) = cos0i sin 02 + sin0i cos 02 (3.8) 

and you have a much simpler than usual derivation of these common identities. You can do similar manipulations for 
other trigonometric identities, and in some cases you will encounter relations for which there's really no other way to 
get the result. That is why you will find that in physics applications where you might use sines or cosines (oscillations, 
waves) no one uses anything but complex exponentials. Get used to it. 

The trigonometric functions of complex argument follow naturally from these. 

= cos 0 + i sin 0, so, for negative angle = cos 0 — i sin 0 

Add these and subtract these to get 

cos0 = X -{e iQ + e~ ie ) and sin0 = e ie - e~ id ) (3.9) 


What is this if 0 = iyl 


cos 


iy = 7 t( e V + e+V ) = coshy 


and sin iy = —{e y — e +y ) = i sinhy 


(3.10) 
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cos(x + iy) = cos x cos iy — sin x sin iy = cos x cosh y — i sin x sinh y and 

sin (x + iy) = sin x cosh y + i cos x sinh y (3.11) 

You can see from this that the sine and cosine of complex angles can be real and larger than one. The hyperbolic functions 
and the circular trigonometric functions are now the same functions. You're just looking in two different directions in 
the complex plane. It's as if you are changing from the equation of a circle, x 2 + y 2 = R 2 , to that of a hyperbola, 
x 2 — y 2 = R 2 . Compare this to the hyperbolic functions at the beginning of chapter one. 

Equation (3.9) doesn’t require that 6 itself be real; call it z. Then what is sin 2 z + cos 2 zl 

cos z = X -(e iz + e~ iz ) and sinz = ^(e iz - e~ iz ) 

cos 2 z + sin 2 z = \ \e 2lz + e~ 2lz + 2 - e 2lz - e~ 2lz + 2l = 1 
4 L J 

This polar form shows a geometric interpretation for the periodicity of the exponential. e*^ +27r ) = = e*(0+ 2fc7 O. 

In the picture, you’re going around a circle and coming back to the same point. If the angle 9 is negative you're just 
going around in the opposite direction. An angle of — 7t takes you to the same point as an angle of +7 r. 

Complex Conjugate 

The complex conjugate of a number z = x + iy is the number z* = x — iy. Another common notation is z. The product 
z*z is (x — iy)(x + iy) = x 2 + y 2 and that is |z| 2 , the square of the magnitude of z. You can use this to rearrange 
complex fractions, combining the various terms with i in them and putting them in one place. This is best shown by 
some examples. 

3 + H _ (3 + 5z)(2 - 3i) _ 21 + i 
2 + 3 i ~ (2 + 3i)(2-3i) “ 13 

What happens when you add the complex conjugate of a number to the number, z + z*7 
What happens when you subtract the complex conjugate of a number from the number? 

If one number is the complex conjugate of another, how do their squares compare? 

What about their cubes? 

What about z + z 2 and z* + z* 2 l 

What about comparing e * = e x+iy and e z * ? 

What is the product of a number and its complex conjugate written in polar form? 

Compare cosz and cost;*. 
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What is the quotient of a number and its complex conjugate? 

What about the magnitude of the preceding quotient? 

Examples 

Simplify these expressions, making sure that you can do all of these manipulations yourself. 


3 - 4? _ (3 - 4?) (2 + i) _ 10 — 5z 
2 -? ~ (2 — ?)(2 + i) ~ 5 


(3? + l) s 


+ 


3 ? 


= (-8 + 6 ?) 


? 3 + ? 10 + ? 


% 2 + 1 

(-*) + (-!) + * zl 
? 


(2 + ?) + 3i(2 — i) 
(2 - i)(2 + i) 


i 2 +i 137 + 1 (-!) + (*) + (!) 


= i. 


(—8 + 6 i ) 


5 + 7? 
5 


2 - 26? 
5 


Manipulate these using the polar form of the numbers, though in some cases you can do it either way. 


Vi = ^ = e 




1 +? 

V2 ' 


1 - ? 
IT? 

2 ? 


1 + ?v / 3 


25 


' V2e ~ in / 4 ' 

V2e in / 4 , 

( 2e* 7r / 2 


= fe _i7r//2 V = p ~ 3i7r / 2 = 


= e 


= i. 


25 


2(|+*|V3) 


(l^) =(^ /6 ) 25 =^< 4+1/6 ' = HV3+>) 


Roots of Unity 

What is the cube root of one? One of course, but not so fast; there are three cube roots, and you can easily find all of 
them using complex exponentials. 

1 = e 2kn \ so l 1 / 3 = U 2km ) 1/3 = e 2kni / 3 (3.12) 

and k is any integer, k = 0, 1, 2 give 

l 1 / 3 = 1, e 27r */ 3 = cos(27t/3) + ?sin(27r/3), e 47r */ 3 = C os(47r/3) +?sin(47r/3) 

_ _ i + . Vs _ _ i _ .Vs 
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and other positive or negative integers k just keep repeating these three values. 



5 th roots of 1 


The roots are equally spaced around the unit circle. If you want the n th root, you do the same sort of calculation: 
the l/n power and the integers k = 0, 1, 2, . . . , (n — 1). These are n points, and the angles between adjacent ones are 
equal. 


3.4 Geometry 

Multiply a number by 2 and you change its length by that factor. 

Multiply it by i and you rotate it counterclockwise by 90° about the origin. 

Multiply is by i 2 = —1 and you rotate it by 180° about the origin. (Either direction: i 2 = (— i) 2 ) 

The Pythagorean Theorem states that if you construct three squares from the three sides of a right triangle, the 
sum of the two areas on the shorter sides equals the area of the square constructed on the hypotenuse. What happens 
if you construct four squares on the four sides of an arbitrary quadrilateral? 

Represent the four sides of the quadrilateral by four complex numbers that add to zero. Start from the origin 
and follow the complex number a. Then follow b, then c, then d. The result brings you back to the origin. Place four 
squares on the four sides and locate the centers of those squares: Pi, P- 2 ,. . . Draw lines between these points as shown. 

These lines are orthogonal and have the same length. Stated in the language of complex numbers, this is 


Pi - P 3 = i{P2 - Pa) 


(3.13) 
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a + b + c + d — 0 
\a + \ia = Pi 
a+\b+\ib = P 2 




Pick the origin at one corner, then construct the four center points Pi, 2, 3, 4 as complex numbers, following the pattern 
shown above for the first two. E.g. , you get to Pi from the origin by going halfway along a, turning left, then going the 
distance |a|/2. Now write out the two complex number Pi — P3 and P 2 — P4 and finally manipulate them by using the 
defining equation for the quadrilateral, a + b + c + d = 0. The result is the stated theorem. See problem 3.54. 


3.5 Series of cosines 

There are standard identities for the cosine and sine of the sum of angles and less familiar ones for the sum of two cosines 
or sines. You can derive that latter sort of equations using Euler's formula and a little manipulation. The sum of two 
cosines is the real part of e lx + e iy , and you can use simple identities to manipulate these into a useful form. 

x = \{x + y) + \(x-y) and y = \(x + y) - \{x - y) 

See problems 3.34 and 3.35 to complete these. 

What if you have a sum of many cosines or sines? Use the same basic ideas of the preceding manipulations, and 
combine them with some of the techniques for manipulating series. 

1 + cos 6 + cos 20 + ■■■ + cos NO = 1 + e* 0 + e 2iS + • ■ ■ e NiS (Real part) 


The last series is geometric, so it is nothing more than Eq. (2.3). 


1 + e* 0 + (e w y + (e if) y 4 (e* & ) JV = 


A0 \ 2 




i6\N _ 1 - 


1 -e ie 


e i{N + i)*/2( e -t(JV+l)0/2 _ e i(N+l)9/2\ sin \(N + 1)9/2) 

Z e i0/2(g-i0/2 _ e i0/2) “ e 


(3.14) 
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From this you now extract the real part and the imaginary part, thereby obtaining the series you want (plus another one, 
the series of sines). These series appear when you analyze the behavior of a diffraction grating. Naturally you have to 
check the plausibility of these results; do the answers work for small 9 ? 

3.6 Logarithms 

The logarithm is the inverse function for the exponential. If e w = z then w = In z. To determine what this is, let 

w = u + iv and z = re* 0 , then e u+iv _ _ re }Q 

This implies that e u = r and so u = lnr, but it doesn't imply v = 9. Remember the periodic nature of the exponential 
function? e™ = e*(0+ 2wr ) i S o you can conclude instead that v = 9 + 2nn. 

In z = In (re 1 ®) = lnr + i (9 + 2nn) (3.15) 

has an infinite number of possible values. Is this bad? You're already familiar with the square root function, and that 
has two possible values, ±. This just carries the idea farther. For example ln(— 1) = in or Sin or — 7in etc. As with 
the square root, the specific problem that you're dealing with will tell you which choice to make. 

in/2 

A sample graph of the logarithm in the com- 
plex plane is ln(l + it) as t varies from — oo to 

■Too. 

— in / 2 

3.7 Mapping 

When you apply a complex function to a region in the plane, it takes that region into another region. When you look 
at this as a geometric problem you start to get some very pretty and occasionally useful results. Start with a simple 
example, 

w = f^ =e z = e x+iy = e x e iy ( 3 . 16 ) 

If y = 0 and x goes from — oo to +oo, this function goes from 0 to oo. 

If t/ is 7r/4 and x goes over this same range of values, / goes from 0 to infinity along the ray at angle n/A above the 
axis. 

At any fixed y, the horizontal line parallel to the x-axis is mapped to the ray that starts at the origin and goes out to 
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infinity. 

The strip from — oo < x < +oo and 0 < y < n is mapped into the upper half plane. 



The line B from — oo + t7r/6 to +oo + Z7t/6 is mapped onto the ray B from the origin along the angle 7t/6. 
For comparison, what is the image of the same strip under a different function? Try 

w = f(z ) = z 2 = x 2 — y 2 + 2 ixy 


The image of the line of fixed y is a parabola. The real part of w has an x 2 in it while the imaginary part is linear in x. 

That is the representation of a parabola. The image of the strip is the region among the lines below. 

G 
F 
E 
D 
C 
B 



Pretty yes, but useful? In certain problems in electrostatics and in fluid flow, it is possible to use complex algebra 
to map one region into another, with the accompanying electric fields and potentials or respectively fluid flows mapped 
from a complicated problem into a simple one. Then you can map the simple solution back to the original problem and 
you have your desired solution to the original problem. Easier said than done. It’s the sort of method that you can learn 
about when you find that you need it. 
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Exercises 

1 Express in the form a + ib: (3 — i) 2 , (2 — Si)(S + 4 i). Draw the geometric representation for each calculation. 

2 Express in polar form, re^\ —2, 3 i, 3 + 3A Draw the geometric representation for each. 

3 Show that (1 + 2f)(3 + 4z)(5 + 6f) satisfies the associative law of multiplication. I.e. multiply first pair first or multiply 
the second pair first, no matter. 

4 Solve the equation z 2 — 2z + c = 0 and plot the roots as points in the complex plane. Do this as the real number c 
moves from c = 0 to c = 2 

5 Now show that (a + bi ) [(c + di){e + fi)] = [(a + bi)(c + di )] (e + fi). After all, just because real numbers satisfy 
the associative law of multiplication it isn't immediately obvious that complex numbers do too. 

6 Given Z\ = 2e* 60 ° and Zo = 4e* 120 °, evaluate z 2 , Z 1 Z 2 , 22 / 21 . Draw pictures too. 

7 Evaluate y/i using the rectangular form, Eq. (3.2), and compare it to the result you get by using the polar form. 

8 Given f(z) = z 2 + z + 1, evaluate /( 3 + 2 i), /( 3 — 2 i). 

9 For the same / as the preceding exercise, what are /'( 3 + 2 i) and f'( 3 — 2i)7 

10 Do the arithmetic and draw the pictures of these computations: 

(3 + 2 i) + (—1 + i), (3 + 2 i) — (—1 + i), (— 4 + 3 i) — (4 + f), —5 + (3 — 5 i) 

11 Show that the real part of z is (z + z*)/2. Find a similar expression for the imaginary part of z. 

12 What is i n for integer n? Draw the points in the complex plane for a variety of positive and negative n. 

13 What is the magnitude of (4 + Si)/ (3 — 4i)? What is its polar angle? 

14 Evaluate (1 + i) 19 ■ 

15 What is y/l — il Do this by the method of Eq. (3.2). 

16 What is v 7 1 — t? Do this by the method of Eq. (3.6). 

17 Sketch a plot of the curve 2 = ae 101 as the real parameter a varies from zero to infinity. Does the behavior of your 
sketch conform to the small a behavior of the function? (And when no one's looking you can plug in a few numbers for 
a to see what this behavior is.) 

18 Verify the graph following Eq. (3.15). 
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Problems 

3.1 Pick a pair of complex numbers and plot them in the plane. Compute their product and plot that point. Do this 
for several pairs, trying to get a feel for how complex multiplication works. When you do this, be sure that you’re not 
simply repeating yourself. Place the numbers in qualitatively different places. 

3.2 In the calculation of the square root of a complex number, Eq. (3.2), I found four roots instead of two. Which ones 
don't belong? Do the other two expressions have any meaning? 

3.3 Finish the algebra in computing the reciprocal of a complex number, Eq. (3.3). 

3.4 Pick a complex number and plot it in the plane. Compute its reciprocal and plot it. Compute its square and square 
root and plot them. Do this for several more (qualitatively different) examples. 

3.5 Plot e ct in the plane where c is a complex constant of your choosing and the parameter t varies over 0 < t < oo. 
Pick another couple of values for c to see how the resulting curves change. Don't pick values that simply give results 
that are qualitatively the same; pick values sufficiently varied so that you can get different behavior. If in doubt about 
how to plot these complex numbers as functions of t, pick a few numerical values: e.g. t = 0.01,0.1, 0.2, 0.3, etc. 
Ans: Spirals or straight lines, depending on where you start 

3.6 Plot sine? in the plane where c is a complex constant of your choosing and the parameter t varies over 0 < t < oo. 
Pick another couple of qualitatively different values for c to see how the resulting curves change. 

3.7 Solve the equation z 2 + iz + 1 = 0 

3.8 Just as Eq. (3.11) presents the circular functions of complex arguments, what are the hyperbolic functions of complex 
arguments? 

3.9 From (e**) 3 , deduce trigonometric identities for the cosine and sine of triple angles in terms of single angles. 
Ans: cos 3a; = cos x — 4 sin 2 x cos x = 4 cos 3 x — 3 cos x 

3.10 For arbitrary integer n > 1, compute the sum of all the n th roots of one. (When in doubt, try n = 2, 3, 4 first.) 

3.11 Either solve for z in the equation e z = 0 or prove that it can't be done. 
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3.12 Evaluate z/ z* in polar form. 

3.13 From the geometric picture of the magnitude of a complex number, the set of points z defined by | z — Zq\ = R is 
a circle. Write it out in rectangular components to see what this is in conventional Cartesian coordinates. 

3.14 An ellipse is the set of points 0 such that the sum of the distances to two fixed points is a constant: \z — Z\\ + \z — 
z 2 I = 2 a. Pick the two points to be Z\ = — / and Z 2 = +/ on the real axis (/ < a). Write z as x + iy and manipulate 
this equation for the ellipse into a simple standard form. I suggest that you leave everything in terms of complex numbers 
(z, z* , z 1 , z*, etc. ) until some distance into the problem. Use x + iy only after it becomes truly useful to do so. 

3.15 Repeat the previous problem, but for the set of points such that the difference of the distances from two fixed 
points is a constant. 

3.16 There is a vertical line x = — / and a point on the x-axis Zq = +/. Find the set of points 0 so that the distance 
to Zq is the same as the perpendicular distance to the line x = -/. 

3.17 Sketch the set of points \z — 1 | < 1 . 

3.18 S implify the numbers 

1 + i -l + n/3 i 5 +i 3 fy/3 + A 2 

1 +1 + iVs' \J 2>\[% — 7v^l7 — 4z 5 V l + i ) 


3.19 Express in polar form; include a sketch in each case. 

2-2 i, Vs + i, -y/5 i, -17-23 i 

3.20 Take two complex numbers; express them in polar form, and subtract them. 

Zi = rie , Z r 2 = r 2 e* 02 , and Z 3 = Z 2 — Z\ 

Compute Z3Z3, the magnitude squared of Z3, and so derive the law of cosines. You did draw a picture didn’t you? 

3.21 What is i l l Ans: If you’d like to check your result, type i A i into Google. Or use a calculator such as the one 
mentioned on page 7. 
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3.22 For what argument does sin# = 2? Next: cos 8 = 2? 
Ans: sin -1 2 = 1.5708 ± *1.3170 


3.23 What are the other trigonometric functions, tan (ix), sec (ix), etc. What are tan and sec for the general argument 

x + iy. 

Ans: tan (x + iy) = (tanx + * tanhy)/(l — itanxtanht/) 


3.24 The diffraction pattern from a grating involves the sum of waves from a large number of parallel slits. For light 
observed at an angle 8 away from directly ahead, this sum is, for IV + 1 slits, 



cos (Zero — ut) + cos (/t(r 0 — d sin 0) — ut) + cos (k(r 0 — 2d sin 6) — cot) + 

. . . + cos (k(ro — Nd sin 6) — ut) 

Express this as the real part of complex exponentials and sum the finite series. Show that the resulting wave is 


sin (l(N + l)kdsin8) 
sin {\kd sin#) 


cos 


[k{r 0 


\Nd sin#) — ut) 


Interpret this result as a wave that appears to be coming from some particular point (where?) and with an intensity 
pattern that varies strongly with 8. 


3.25 (a) If the coefficients in a quadratic equation are real, show that if z is a complex root of the equation then so is 
z* . If you do this by reference to the quadratic formula, you’d better find another way too, because the second part of 
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this problem is 

(b) Generalize this to the roots of an arbitrary polynomial with real coefficients. 


3.26 You can represent the motion of a particle in two dimensions by using a time-dependent complex number with 
z = x + iy = re™ showing its rectangular or polar coordinates. Assume that r and 6 are functions of time and 
differentiate re i(> to get the velocity. Differentiate it again to get the acceleration. You can interpret as the unit 
vector along the radius and ie lS as the unit vector perpendicular to the radius and pointing in the direction of increasing 
theta. Show that 


d 2 z jo 

d 2 r 

( de Y 

dt 2 

W~ r 

[dt) 


+ xe 


id 


\ d 2 6 drdQ 1 
dt 2 dt dt 


(3.17) 


and translate this into the usual language of components of vectors, getting the radial (r) component of acceleration 
and the angular component of acceleration as in section 8.9. 


3.27 Use the results of the preceding problem, and examine the case of a particle moving directly away from the origin, 
(a) What is its acceleration? (b) If instead, it is moving at r = constant, what is its acceleration? (c) If instead, x = Xq 
and y = Vot, what are r(t ) and 0(t)7 Now compute d 2 z/dt 2 from Eq. (3.17). 

3.28 Was it really legitimate simply to substitute x + iy for 6 1 + 62 in Eq. (3.11) to get cos (x + iy)7 Verify the result 
by substituting the expressions for cosx and for cosh?/ as exponentials to see if you can reconstruct the left-hand side. 

3.29 The roots of the quadratic equation z 2 + bz + c = 0 are functions of the parameters b and c. For real b and c and 
for both cases c > 0 and c < 0 (say ±1 to be specific) plot the trajectories of the roots in the complex plane as b varies 
from —00 to +00. You should find various combinations of straight lines and arcs of circles. 


3.30 In integral tables you can find the integrals for such functions as 


J dx e ax cosbx, or 


J dxe ax sin bx 


Show how easy it is to do these by doing both integrals at once. Do the first plus i times the second and then separate 
the real and imaginary parts. 


3.31 Find the sum of the series 


E 

1 


n 
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Ans: t7r/4 — ^ In 2 

3.32 Evaluate | cos z\ 2 . Evaluate | sinz| 2 . 

3.33 Evaluate \/l + i. Evaluate ln(l + i). Evaluate tan(l + i). 


3.34 (a) Beats occur in sound when two sources emit two frequencies that are almost the same. The perceived wave is 
the sum of the two waves, so that at your ear, the wave is a sum of two cosines of uj\t and of u 2 t. Use complex algebra 
to evaluate this. The sum is the real part of 


gitvit 


+ e iu}2t 


Notice the two identities 


OJl + UJ2 Wl- 

U)\ = 

2 2 


and the difference of these for u 2 . Use the complex exponentials to derive the results; don’t just look up some trig 
identity. Factor the resulting expression and sketch a graph of the resulting real part, interpreting the result in terms of 
beats if the two frequencies are close to each other, (b) In the process of doing this problem using complex exponentials, 
what is the trigonometric identity for the sum of two cosines? While you’re about it, what is the difference of two 
cosines? 

Ans: cos u\t + cos u 2 t = 2 cos \ (cui + u 2 )t cos \ (cui — u 2 )t 


3.35 Derive using complex exponentials: sinx — sin y = 2 sin cos (^ 2 )- 

3.36 The equation (3.4) assumed that the usual rule for multiplying exponentials still holds when you are using complex 
numbers. Does it? You can prove it by looking at the infinite series representation for the exponential and showing that 


a 2 a 3 

, b 2 b 3 


I\ . , (a + b) 2 

1 + a+ 2\ + 3\ + '"_ 

1 + h + 2! + 3! + " 

— 

1 + (a + 6)+ 21 


You may find Eq. (2.19) useful. 

3.37 Look at the vertical lines in the z-plane as mapped by Eq. (3.16). I drew the images of lines y = constant, now 
you draw the images of the straight line segments x = constant from 0 < y < tt. The two sets of lines in the original 
plane intersect at right angles. What is the angle of intersection of the corresponding curves in the image? 

3.38 Instead of drawing the image of the lines x = constant as in the previous problem, draw the image of the line 
y = xtancr, the line that makes an angle a with the horizontal lines. The image of the horizontal lines were radial lines. 
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At a point where this curve intersects one of the radial lines, what angle does the curve make with the radial line? Show 
that the answer is a, the same angle of intersection as in the original picture. 

3.39 Write each of these functions of z as two real functions u and v such that f(z) = u(x, y) + iv(x, y). 

3 1 + Z 1 Z 

z ’ rr? iz’ & 

3.40 Evaluate z i where 0 is an arbitrary complex number, z = x + iy = re* 0 . 

3.41 What is the image of the domain — oo < x < +oo and 0 < y < n under the function w = \fz! Ans: One 
boundary is a hyperbola. 

3.42 What is the image of the disk \z — a\ < b under the function w = cz + d? Allow c and d to be complex. Take a 
real. 

3.43 What is the image of the disk \z — a\ < b under the function w = l/zl Assume b < a. Ans: Another disk, 
centered at aj ( a 2 — b 2 ). 

3.44 (a) Multiply (2 + i)(3 + i) and deduce the identity 

tan -1 (l/2) + tan -1 (l/3) = 7t/4 

(b) Multiply (5 + i) 4 (— 239 + i) and deduce 

4tan _1 (l/5) — tan _1 (l/239) = 7t/4 
For (b) a sketch will help sort out some signs. 

(c) Using the power series representation of the tan -1 , Eq. (2.27), how many terms would it take to compute 100 
digits of 7 r as 4 tan -1 1? How many terms would it take using each of these two representations, (a) and (b), for 7t? 
Ans: Almost a googol versus respectively about 540 and a few more than 180 terms. 

3.45 Use Eq. (3.9) and look back at the development of Eq. (1.4) to find the sin -1 and cos -1 in terms of logarithms. 

3.46 Evaluate the integral dx e~ ax ~ cos fdx for fixed real a and fd. Sketch a graph of the result versus j3. Sketch a 
graph of the result versus a, and why does the graph behave as it does? Notice the rate at which the result approaches 
zero as either a — > 0 or a — > oo. The behavior is very different in the two cases. Ans: e~ @ ? 4 a y/ / n/cx. 
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3.48 Compute (a) sin 1 i. (b) cos 1 i. (c) tan l i. (d) sinh 1 i. Ans: sin 

3.49 By writing 

1 ill 1 


1 + x 2 


x + 1 


x — i 


and integrating, check the equation 



dx 

1 + x 2 


7 r 

4 


0 + 0.881 i, cos 1 i 


tt/2 - 0.881 A 


3.50 Solve the equations (a) coshw = 0 (b) tank u = 2 (c) sech u = 2 i 

Ans: sech" 1 2 i = 0.4812 - il.5707 

3.51 Solve the equations (a) z — 2z* = 1 (b) z 3 — 3z 2 + 4z = 2 i after verifying that 1 + i is a root. Compare 

the result of problem 3.25. 


3.52 Confirm the plot of ln(l +iy) following Eq. (3.15). Also do the corresponding plots for ln(10+zt/) and ln(100+ft/). 
And what do these graphs look like if you take the other branches of the logarithm, with the i{6 + 2nir)l 

3.53 Check that the results of Eq. (3.14) for cosines and for sines give the correct results for small 61 What about 

6 — > 2i r? 


3.54 Finish the calculation leading to Eq. (3.13), thereby proving that the two indicated lines have the same length and 
are perpendicular. 


3.55 In the same spirit as Eq. (3.13) concerning squares drawn on the sides of an arbitrary quadrilateral, 
start with an arbitrary triangle and draw equilateral triangles on each side. Find the centroids of each 
of the equilateral triangles and connect them. The result is an equilateral triangle. Recall: the centroid 
is one third the distance from the base to the vertex. [This one requires more algebra than the one in 
the text.] (Napoleon's Theorem) 



Differential Equations 


The subject of ordinary differential equations encompasses such a large field that you can make a profession of it. There 
are however a small number of techniques in the subject that you have to know. These are the ones that come up so 
often in physical systems that you need both the skills to use them and the intuition about what they will do. That small 
group of methods is what I'll concentrate on in this chapter. 


4.1 Linear Constant-Coefficient 

A differential equation such as 



+ t 2 x A 


+ 1 = 0 


relating acceleration to position and time, is not one that I’m especially eager to solve, and one of the things that makes 
it difficult is that it is non-linear. This means that starting with two solutions X\ (t) and X 2 (t), the sum X\ + x 2 is not a 
solution; look at all the cross-terms you get if you try to plug the sum into the equation and have to cube the sum of 
the second derivatives. Also if you multiply X\ (t) itself by 2 you no longer have a solution. 

An equation such as 


t d 3 x 

(It 3 


, 9 dx 

+ t 2 -j-~x = 0 
dt 


may be a mess to solve, but if you have two solutions, x\ (t) and x 2 ( t) then the sum ax\ + (dx 2 is also a solution. 
Proof? Plug in: 


t d 3 (ax 1 + ( 3 x 2 ) ^ 2 d(ax 1 + (dx 2 ) 


= a 


dt 3 
t d 3 x 1 
~dt 3 ~ 


dt 


(axi + ( 3 x 2 ) 




d 3 x 2 , 9 dx 2 

-(m +t W - X2 


= 0 


This is called a linear, homogeneous equation because of this property. A similar-looking equation, 


t d 3 x 
' dt 3 


<2 dx 

+ t ~ x = t 

dt 


does not have this property, though it’s close. It is called a linear, inhomogeneous equation. If X\(t) and x 2 (t) are 
solutions to this, then if I try their sum as a solution I get 2 1 = t, and that's no solution, but it misses working only 
because of the single term on the right, and that will make it not too far removed from the preceding case. 
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One of the most common sorts of differential equations that you see is an especially simple one to solve. That's 
part of the reason it's so common. This is the linear, constant-coefficient, differential equation. If you have a mass tied 
to the end of a spring and the other end of the spring is fixed, the force applied to the mass by the spring is to a good 
approximation proportional to the distance that the mass has moved from its equilibrium position. 

If the coordinate x is measured from the mass's equilibrium position, the equation F = ma says 



d 2 x 

m—- 7 r = —kx 
at 2 


(4.1) 


If there's friction (and there’s always friction), the force has another term. Now how do you describe friction mathemat- 
ically? The common model for dry friction is that the magnitude of the force is independent of the magnitude of the 
mass's velocity and opposite to the direction of the velocity. If you try to write that down in a compact mathematical 
form you get something like 

-» v 

-ffriction = /^k-^NT^r (4.2) 

M 

This is hard to work with. It can be done, but I’m going to do something different. (See problem 4.31 however.) Wet 
friction is easier to handle mathematically because when you lubricate a surface, the friction becomes velocity dependent 
in a way that is, for low speeds, proportional to the velocity. 

^friction = -bv (4.3) 


Neither of these two representations is a completely accurate description of the way friction works. That's far more 
complex than either of these simple models, but these approximations are good enough for many purposes and I’ll settle 
for them. 

Assume "wet friction” and the differential equation for the motion of m is 


d 2 x 


m 


dt 2 


—kx 



(4.4) 


This is a second order, linear, homogeneous differential equation, which simply means that the highest derivative present 
is the second, the sum of two solutions is a solution, and a constant multiple of a solution is a solution. That the 
coefficients are constants makes this an easy equation to solve. 

All you have to do is to recall that the derivative of an exponential is an exponential. de t /dt = e l . Substitute 
this exponential for x(t), and of course it can’t work as a solution; it doesn't even make sense dimensionally. What is 
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e to the power of a day? You need something in the exponent to make it dimensionless, e at . Also, the function x is 
supposed to give you a position, with dimensions of length. Use another constant: x(t) = Ae at . Plug this into the 
differential equation (4.4) to find 

mAa 2 e at + bAae at + kAe at = Ae at [ ma 2 + ba + k] =0 

The product of factors is zero, and the only way that a product of two numbers can be zero is if one of the numbers is 
zero. The exponential never vanishes, and for a non-trivial solution A A 0, so all that's left is the polynomial in a. 


ma 2 + ba + k = 0, 


with solutions 


a = 


—b ± \Jb 2 — 4km 


2m 


The position function is then 


x 


(?) = Ae ait + Be a2t 


(4.5) 


(4.6) 


where A and B are arbitrary constants and a\ and «2 are the two roots. 

Isn't this supposed to be oscillating? It is a harmonic oscillator after all, but the exponentials don't look very 
oscillatory. If you have a mass on the end of a spring and the entire system is immersed in honey, it won’t do much 
oscillating! Translated into mathematics, this says that if the constant b is too large, there is no oscillation. In the 
equation for a, if b is large enough the argument of the square root is positive, and both a's are real — no oscillation. 
Only if b is small enough does the argument of the square root become negative; then you get complex values for the 
a's and hence oscillations. 

Push this to the extreme case where the damping vanishes: b = 0. Then a± = iy/k/m and «2 = —iy/k/m. 
Denote ujq = yfkfm. 

x(t) = Ae iuJot + Be~ iuJot (4.7) 

You can write this in other forms using sines and cosines, see problem 4.10. To determine the arbitrary constant A 
and B you need two equations. They come from some additional information about the problem, typically some initial 
conditions. Take a specific example in which you start from the origin with a kick, x(0) = 0 and x(0) = Vq. 


x(0) = 0 = A + B, 

Solve for A and B to get A = —B = vq/ (2iujq). Then 

x{t) = ^~ \e iu}ot 
y J 2iluq l 


±(0) = Vq = IuJqA — iuJoB 


e luJot 1 = — sin uot 
J 


4 — Differential Equations 


91 


As a check on the algebra, use the first term in the power series expansion of the sine function to see how x behaves 
for small t. The sine factor is sincuo t ~ ujq t, and then x(t) is approximately Vq t, just as it should be. Also notice that 
despite all the complex numbers, the final answer is real. This is another check on the algebra. 

Damped Oscillator 

If there is damping, but not too much, then the cr’s have an imaginary part and a negative real part. (Is it important 
whether it’s negative or not?) 


, . / , , fk 

±iu ' where u = \ „ 

V m Am 2 

This represents a damped oscillation and has frequency a bit lower than the one in the undamped case, 
initial conditions as above and you will get similar results (let 7 = b/2m) 


a = 


—b ± iy/Akm — b 2 


2m 


b_ 

2m 


( 4 . 8 ) 

Use the same 


x{t) = Ae { ~^ +iuJ,)t + Be^~ iu},)t 
x( 0 ) = A + B = 0 , v x (0) = (—7 + ioj')A + (—7 — iuj')B = Vo 


( 4 . 9 ) 


The two equations for the unknowns A and B imply B = —A and 


2iuj'A = vq, 


so 


x(t) = 


2 vJ 1 


giw't _ g— iuo't 


"I V 0 ryt . / , 

= — e 1 sin a; t 
J l o’ 


( 4 . 10 ) 



For small values of t, the first terms in the power series expansion of this result are 

x(t) = [1 — 7 1 + 7 2 t 2 /2 — . . .] [u't — co'H 3 / 6 + . . .] = vq t — vo'y t 2 + . . . 

UJ 

The first term is what you should expect, as the initial velocity is v x = Vq- The negative sign in the next term says that 
it doesn't move as far as it would without the damping, but analyze it further. Does it have the right size as well as the 

right sign? It is —Vo'y t 2 = —vq ( b/2m)t 2 . But that’s an acceleration: a x t 2 / 2. It says that the acceleration just after the 

motion starts is a x = —bvo/m. Is that what you should expect? As the motion starts, the mass hasn't gone very far so 
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the spring doesn't yet exert much force. The viscous friction is however —bv x . Set that equal to ma x and you see that 
—Vo'y t 2 has precisely the right value: 

. , n bn 1 —bVl) n 

x(t) w v 0 t - vo'y r = v 0 t - v 0 — r = v 0 t + r 

2m 2 m 

The last term says that the acceleration starts as a x = —bvo/m, as required. 

In Eq. (4.8) I assumed that the two roots of the quadratic, the two ct’s, are different. What if they aren't? Then 
you have just one value of a to use in defining the solution e at in Eq. (4.9). You now have just one arbitrary constant 
with which to match two initial conditions. You're stuck. See problem 4.11 to understand how to handle this case 
(critical damping). It's really a special case of what I’ve already done. 

What is the energy for this damped oscillator? The kinetic energy is mv 2 / 2 and the potential energy for the 
spring is kx 2 / 2. Is the sum constant? No. 

If F x = ma x = -kx + F Xjfri ct , then 
dE d 1 , 9 9N dv , dx , , x 

~df = dt 2 ( mv + kx ) = mv ~dt + kX ~dt = Vx v 7iax + kx ) = F x.h\ctV x (4.11) 

“Force times velocity” is a common expression for power, and this says that the total energy is decreasing according to 
this formula. For the wet friction used here, this is dE / dt = —bv 2 , and the energy decreases exponentially on average. 

4.2 Forced Oscillations 

What happens if the equation is inhomogeneous? That is, what if there is a term that doesn’t involve x or its derivatives 
at all. In this harmonic oscillator example, apply an extra external force. Maybe it's a constant; maybe it's an oscillating 
force; it can be anything you want not involving x. 

m W = ~ kx ~ h< ^t + Fext ^ ^ 4 ' 12) 

The key result that you need for this class of equations is very simple to state and not too difficult to implement. It is a 
procedure for attacking any linear inhomogeneous differential equation and consists of three steps. 

1. Temporarily throw out the inhomogeneous term [here i^ ext (t)] and completely solve the resulting homo- 
geneous equation. In the current case that's what you just saw when I worked out the solution to the 
differential equation md 2 x/dt 2 + bdx/dt + kx = 0. [:X’hom(^)] 

2. Find any one solution to the full inhomogeneous equation. Note that for step one you have to have all 
the arbitrary constants present; for step two you do not. [aq n h(i)] 
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3. Add the results of steps one and two. [Xhom(^) +^inh(£)] 

I've already done step one. To carry out the next step I’ll start with a particular case of the forcing function. If 
F ext (t ) is simple enough, you should be able to guess the answer to step two. If it’s a constant, then a constant will 
work for x. If it’s a sine or cosine, then you can guess that a sine or cosine or a combination of the two should work. If 
it's an exponential, then guess an exponential — remember that the derivative of an exponential is an exponential. If 
it's the sum of two terms, such as a constant and an exponential, it’s easy to verify that you add the results that you get 
for the two cases separately. If the forcing function is too complicated for you to guess a solution then there’s a general 
method using Green's functions that I'll get to in section 4.6. 

Choose a specific example 

F ext (t) = F 0 [l-e-P t ] (4.13) 

This starts at zero and builds up to a final value of Fq. It does it slowly or quickly depending on (3. 



Start with the first term, Fq, for external force in Eq. (4.12). Try x(t) = C and plug into that equation to find 

kC = F 0 

This is simple and determines C . 

Next, use the second term as the forcing function, —F^e - ^. Guess a solution x{t) = C'e and plug in. The 
exponential cancels, leaving 

mC' (3 2 - bC'p + kC' = -F 0 or a = ~ F 7 0 o — r 

m[3 z — bp + k 

The total solution for the inhomogeneous part of the equation is then the sum of these two expressions. 

X ' mh{t) ^ F °(k~ 7Tl(3 2 -bP + k e /3/ ) 

The homogeneous part of Eq. (4.12) has the solution found in Eq. (4.6) and the total is 

x(t) = Xhom (t) + Tnh (t) = x(t) = Ae ait + Be a2t + F 0 (^~ ^ ^ + (4.14) 
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There are two arbitrary constants here, and this is what you need because you have to be able to specify the initial 
position and the initial velocity independently; this is a second order differential equation after all. Take for example the 
conditions that the initial position is zero and the initial velocity is zero. Everything is at rest until you start applying 
the external force. This provides two equations for the two unknowns. 


x(0) = 0 


x(0) = 0 


A + B + F 0 


m/3 2 - 
k{m(3 2 - 


Aa\ + Bq.2 + Fq ypr 

mfj z 


-bp 
bp + k) 

_A 

-bp + k 


Now all you have to do is solve the two equations in the two unknowns A and B. Take the first, multiply it by 012 and 
subtract the second. This gives A. Do the same with aq instead of to get B . The results are 


. _ 1 p a2{mP 2 — bP) — kp 

— «2 ° k(mP 2 — bp + k ) 


Interchange a\ and 01,2 to get B. 
The final result is 


x{t) = 


F 0 (a 2 {jmP 2 - bP) - kp)e ait - (o>i{mP 2 - bP) - kp)e a2t 

Ol\ — OI2 


+ F 0 T - 


k(mP 2 — bp + k) 


,-Pt 


k mP 2 - bp + k 


(4.15) 


If you think this is messy and complicated, you haven't seen messy and complicated. When it takes 20 pages to write 
out the equation, then you’re entitled say that it is starting to become involved. 

Why not start with a simpler example, one without all the terms? The reason is that a complex expression is often 
easier to analyze than a simple one. There are more things that you can do to it, and so more opportunities for it to go 
wrong. The problem isn’t finished until you've analyzed the supposed solution. After all, I may have made some errors 
in algebra along the way. Also, analyzing the solution is the way you learn how these functions work. 

1. Everything in the solution is proportional to Fq and that's not surprising. 

2. I’ll leave it as an exercise to check the dimensions. 

3. A key parameter to vary is p. What should happen if it is either very large or very small? In the former 
case the exponential function in the force drops to zero quickly so the force jumps from zero to Fq in a 
very short time — a step in the limit that P — > 0. 
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4. If (3 is very small the force turns on very gradually and gently, as though you are being very careful not 
to disturb the system. 

Take point 3 above: for large (3 the dominant terms in both numerator and denominator everywhere are the m/3 2 
terms. This result is then very nearly 


x{t) 


F 0 (a 2 (;m/3 2 ))e ait — ( ai(m/3 2 ))e a2t 

OL i — OL 2 

Fo 


k[cxi — ol 2) 

Use the notation of Eq. (4.9) and you have 

F 


hn/3 2 

[(a 2 e ait - aie a ' 2t ] + 


+ Fo( r 


1 


D -/3t 


k ( m/3 2 ) 


x(t) 


[((-T' - ^')e ( - 7W)t - (-7 + + F Q l 

iuj' - (-7 - iuj')) k 


k ( — 7 + iu>' — 

F n e ~ 1 
~ 2i7sinw,t ~ Wc<w ''] + F »k 
F 0 e _7t r 7 . ,. 


k 


\ sin t o't — cos cu'tl + Fo T 

L u>' J k 


(4.16) 



At time t = 0 this is still zero even with the approximations. That's comforting, but if it hadn't happened it’s not 
an insurmountable disaster. This is an approximation to the exact answer after all, so it could happen that the initial 
conditions are obeyed only approximately. The exponential terms have oscillations and damping, so the mass oscillates 
about its eventual equilibrium position and after a long enough time the oscillations die out and you are left with the 
equilibrium solution x = F 0 /k. 

Look at point 4 above: For small (3 the /3 2 terms in Eq. (4.15) are small compared to the (3 terms to which they 
are added or subtracted. The numerators of the terms with e at are then proportional to (3. The denominator of the 
same terms has a k — b/3 in it. That means that as (3 — > 0, the numerator of the homogeneous term approaches zero and 
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its denominator doesn't. The last terms, that came from the inhomogeneous part, don’t have any f3 in the numerator 
so they don’t vanish in this limit. The approximate final result then comes solely from the Xj n h(i) term. 

x(t) «F 0 i(l — e~ pt ) 

It doesn't oscillate at all and just gradually moves from equilibrium to equilibrium as time goes on. It’s what you get if 
you go back to the differential equation (4.12) and say that the acceleration and the velocity are negligible. 

d 2 x r , . dx r 1 „ 

m-^[^0\ = -kx-b-^[^0\ + F ext (t) =k x&^F ext {t) 

The spring force nearly balances the external force at all times; this is “quasi-static," in which the external force is turned 
on so slowly that it doesn't cause any oscillations. 

4.3 Series Solutions 

A linear, second order differential equation can always be rearranged into the form 

y" + P{x)y' + Q(x)y = R(x) (4-17) 

If at some point Xq the functions P and Q are well-behaved, if they have convergent power series expansions about xq, 
then this point is called a “regular point” and you can expect good behavior of the solutions there — at least if R is 
also regular there. 

I'll look just at the case for which the inhomogeneous term R = 0. If P or Q has a singularity at xq, perhaps 
something such as l/(x — Xq) or y/x — Xq , then Xo is called a “singular point" of the differential equation. 

Regular Singular Points 

The most important special case of a singular point is the “regular singular point" for which the behaviors of P and Q 
are not too bad. Specifically this requires that (x — Xq)P(x ) and [x — Xq ) 2 Q(x) have no singularity at Xq. For example 

y" + if + y = o and y" + -,?/ + xy = 0 
X x z x z 

have singular points at x = 0, but the first one is a regular singular point and the second one is not. The importance of 
a regular singular point is that there is a procedure guaranteed to find a solution near a regular singular point (Frobenius 
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series). For the more general singular point there is no guaranteed procedure (though there are a few tricks* that 
sometimes work). 

Examples of equations that show up in physics problems are 


y" + V = o 

(1 - x 2 )y" - 2 xy' + £(£ + l)y = 0 
x 2 y" + xy' + (x 2 - n 2 )y = 0 
xy" + (a + 1 — x)y' + ny = 0 


regular singular points at ±1 
regular singular point at zero 
regular singular point at zero 


(4.18) 


These are respectively the classical simple harmonic oscillator, Legendre equation, Bessel equation, generalized Laguerre 
equation. 

A standard procedure to solve these equations is to use series solutions, but not just the standard power series 
such as those in Eq. (2.4). Essentially, you assume that there is a solution in the form of an infinite series and you 
systematically compute the terms of the series. I'll pick the Bessel equation from the above examples, as the other three 
equations are done the same way. The parameter n in that equation is often an integer, but it can be anything. It’s 
common for it to be or 3 /2 or sometimes even imaginary, but there's no need to make any assumptions about it for 
now. 

Assume a solution in the form : 


OO 

Frobenius Series: y(x) = E a k x k+s (a 0 + 0) 

o 


(4.19) 


If s = 0 or a positive integer, this is just the standard Taylor series you saw so much of in chapter two, but this simple- 
looking extension makes it much more flexible and suited for differential equations. It often happens that s is a fraction 
or negative, but this case is no harder to handle than the Taylor series. For example, what is the series expansion of 
(cosx)/x about the origin? This is singular at zero, but it’s easy to write the answer anyway because you already know 
the series for the cosine. 

cosx 1 x x 3 x 5 

x “a: - 2 + 24~720 + "' 

It starts with the term l/x corresponding to s = — 1 in the Frobenius series. 

* The book by Bender and Orszag: “Advanced mathematical methods for scientists and engineers" is a very readable 
source for this and many other topics. 
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Always assume that ao yf 0, because that just defines the coefficient of the most negative power, X s . If you allow 
it be zero, that's just the same as redefining s and it gains nothing except confusion. Plug this into the Bessel differential 
equation. 

x 2 y" + xy' + ( x 2 - n 2 )y = 0 

OO OO OO 

x 2 X a k(k + s)(k + s - l)x k+s ~ 2 + a; X a k{k + s)x k+s ~ 1 + {x 2 - n 2 ) X dkX k+s = 0 


k = o 


k = 0 


k = 0 


X a k{k + s)(k + s — l)x fc+s + X a k(k + 'S)a; fc+s + X 


k-\-s -\-2 ^2 


X a fe a; fe+s = 0 


fc=o 


fc=o 


k=0 


X [(& + s)(& + s — 1 ) + (k + s) — n 2 ]x k+s + X Ofc^ fc+s+2 = 0 
k = o fc=o 

The coefficients of all the like powers of a; must match, and in order to work out the matches efficiently, and so as not 
to get myself confused in a mess of indices, I’ll make an explicit change of the index in the sums. Do this trick every 
time. It keeps you out of trouble. 

Let i = k in the first sum. Let i = k + 2 in the second. Explicitly show the limits of the index on the sums, or 
you're bound to get it wrong. 

OO OO 

X a i [(^ + s ) 2 ~ n2 ] x£+S + X a £- 2 xi+s = 0 

£=0 1=2 

The lowest power of a: in this equation comes from the i = 0 term in the first sum. That coefficient of a; 5 must vanish. 

(«o 7^ 0) 

a 0 [s 2 - n 2 ] = 0 (4.20) 

This is called the indicia! equation. It determines s, or in this case, maybe two s's. After this, set to zero the coefficient 

of x^ +s . 

d £ [(i + s ) 2 — n 2 ] + d £ _ 2 = 0 (4.21) 

This determines d 2 in terms of «q ; it determines a 4 in terms of <22 etc. 

1 

d £ - ~d £ _ 2 y + s y_ n2 , 

For example, if n = 0, the indicial equation says s = 0. 


= 2, 4, 


1 1 1 

d 2 — — ao^2 1 ° 4 — ~ a 2yp?. ~ +°o 


-42 


2242 ’ 


° 6 ° 4 q 2 a ° 2 2 4 2 6 2 
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1 °° (r! 9i 2 ^ 

a 2 fc = (- 1 ) fca o^2jfcfcj2 then t/(x) = a 0 ^](-l) fc n.n2 = fl o Jofo) ( 4 . 22 ) 

and in the last equation I rearranged the factors and used the standard notation for the Bessel function, J n (x). 

This is a second order differential equation. What about the other solution? This Frobenius series method is 
guaranteed to find one solution near a regular singular point. Sometimes it gives both but not always, and in this 
example it produces only one. There are procedures that will let you find the second solution to this sort of second order 
differential equation. See problem 4.49 for one such method. 

For the case n = 1/2 the calculations just above will produce two solutions. The indicial equation gives s = ±y 2 . 
After that, the recursion relation for the coefficients give 


a ^\£ + s f-n 2 


ae ~ 2 P + 2 is ae ~ 2 £(e + 2s) ae ~ 2 £(£±l) 


For the .s = + '/a result 


1 


a 2 — — do 


2-3 
a 2 k = (-l) ft ao 


1 1 

CIa — — do — ~\~Of] 

4-5 u 2 ■ 3 ■ 4 ■ 5 


k, 1 


(2fc + l)! 


This solution is then 


= ftnT 1 / 2 


y(x) = a 0 x 


x 2 x 4 
~ 3f + 5f 


This series looks suspiciously like the series for the sine function, but is has some of the x's or some of the factorials in 
the wrong place. You can fix that if you multiply the series in brackets by x. You then have 


= rtc'r. l / 2 


y(x) = a 0 x 


1^5 

x H — - 

3! 5! 


sin a; 


(4.23) 


I’ll leave it to problem 4.15 for you to find the other solution. 

Do you need to use a Frobenius series instead of just a power series for all differential equations? No, but I 
recommend it. If you are expanding about a regular point of the equation then a power series will work, but I find it 
more systematic to use the same method for all cases. It’s less prone to error. 
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4.4 Some General Methods 

It is important to be familiar with the arsenal of special methods that work on special types of differential equations. 
What if you encounter an equation that doesn't fit these special methods? There are some techniques that you should 
be familiar with, even if they are mostly not ones that you will want to use often. Here are a couple of methods that can 
get you started, and there’s a much broader set of approaches under the heading of numerical analysis; you can explore 
those in section 11.5. 

If you have a first order differential equation, dx/dt = f(x,t), with initial condition x(to) = Xq then you can 
follow the spirit of the series method, computing successive orders in the expansion. Assume for now that the function 
/ is smooth, with as many derivatives as you want, then use the chain rule a lot to get the higher derivatives of x 


dx 

dt 




d 2 x 
dt 2 


= d£ d£dx = 

dt + dxdt h + fx 


x = fit + 2 fxtX + f xx X 2 + fx'x = ftt + 2 f xt X + f xx x 2 + f x [f t + f x x] 
x{t) =x 0 + f(x 0 , t 0 )(t - to) + lx(t 0 )(t - to) 2 + lx(t 0 )x(t 0 ){t - to) 3 H 


(4.24) 


Here the dot-notation (x etc.) is a standard shorthand for derivative with respect to time. This is unlike using a prime 
for derivative, which is with respect to anything you want. These equations show that once you have the initial data 
(to, Xo), you can compute the next derivatives from them and from the properties of /. Of course if / is complicated this 
will quickly become a mess, but even then it can be useful to compute the first few terms in the power series expansion 
of x. 

For example, x = f(x, t) = Ax 2 ( 1 + ut) with to = 0 and Xo = a. 

xo = Aa 2 , x 0 = Aa 2 co + 2 A 2 a 3 , x 0 = ^A 2 a 3 u) + 2 A 3 a 4 + 2 Aa [Aa 2 u + 2 A 2 a 3 ] (4-25) 

If A = l/m-sand lo = l/s with a = 1 m this is 

x(t) = 1 + 1 + \t 2 + 2t 3 + • • • 


You can also solve this example exactly and compare the results to check the method. 

What if you have a second order differential equation? Pretty much the same thing, though it is sometimes 
convenient to make a slight change in the appearance of the equations when you do this. 


x = f (x, x, t) 


can be written 


x = V, 


V = f(x,v,t) 


(4.26) 
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so that it looks like two simultaneous first order equations. Either form will let you compute the higher derivatives, but 
the second one often makes for a less cumbersome notation. You start by knowing to, Xo, and now Vq = Xq. 

Some of the numerical methods you will find in chapter 11 start from the ideas of these expansions, but then 
develop them along different lines. 

There is an iterative methods that of more theoretical than practical importance, but it’s easy to understand. I’ll 
write it for a first order equation, but you can rewrite it for the second (or higher) order case by doing the same thing 
as in Eq. (4.26). 


x = f(x,t) with x(to) = Xq generates X\ (t) = / dtt' f(xo, t') 

Jto 

This is not a solution of the differential equation, but it forms the starting point to find one because you can iterate this 
approximate solution x\ to form an improved approximation. 

x k (t) = [ dtffa^l?),?), k = 2,3,... (4.27) 

J to 

This will form a sequence that is usually different from that of the power series approach, though the end result better 
be the same. This iterative approach is used in one proof that shows under just what circumstances this differential 
equation x = f has a unique solution. 

4.5 Trigonometry via ODE’s 

The differential equation u" = —u has two independent solutions. The point of this exercise is to derive all (or at least 
some) of the standard relationships for sines and cosines strictly from the differential equation. The reasons for spending 
some time on this are twofold. First, it's neat. Second, you have to get used to manipulating a differential equation in 
order to find properties of its solutions. This is essential in the study of Fourier series as you will see in section 5.3. 

Two solutions can be defined when you specify boundary conditions. Call the functions c(x) and s(x), and specify 
their respective boundary conditions to be 

c(0) = 1, c'(O) = 0, and s(0) = 0, s'(0) = 1 (4.28) 

What is s'(x)? First observe that s' satisfies the same differential equation as s and c: 

u" = —u =>■ (u')" = ( u")' = —u', and that shows the desired result. 
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This in turn implies that s' is a linear combination of s and c, as that is the most general solution to the original 
differential equation. 


Use the boundary conditions: 


s'(x) = Ac(x ) + Bs(x ) 


s'(0) = 1 = Ac( 0) + Bs{ 0) = A 

From the differential equation you also have 


s"(0) = -s(0) = 0 = Ac'( 0) + Bs'( 0) = B 


Put these together and you have 

s'(x ) = c(x ) And a similar calculation shows c'(x ) = — s(x) (4.29) 

What is c(x) 2 + s(x) 2 ? Differentiate this expression to get 

~^[c(x) 2 + s(x) 2 ] = 2 c(x)c'(x) + 2 s(x)s'(x) = -2c(x)s(x) + 2s(x)c(a;) = 0 

This combination is therefore a constant. What constant? Just evaluate it at x = 0 and you see that the constant is 
one. There are many more such results that you can derive, but that's left for the exercises, problem 4.21 et seq. 

4.6 Green’s Functions 

Is there a general way to find the solution to the whole harmonic oscillator inhomogeneous differential equation? One 
that does not require guessing the form of the solution and applying initial conditions? Yes there is. It's called the 
method of Green's functions. The idea behind it is that you can think of any force as a sequence of short, small kicks. 
In fact, because of the atomic nature of matter, that's not so far from the truth. If you can figure out the result of an 
impact by one molecule, you can add the results of many such kicks to get the answer for 10 23 molecules. 

I'll start with the simpler case where there’s no damping, b = 0 in the harmonic oscillator equation. 

rnx + kx = F ext (t ) (4.30) 

Suppose that everything is at rest at the origin and then at time t' the external force provides a small impulse. The 
motion from that point on will be a sine function starting at t ' , 


A sin (tUo(? — t')) ( t > t ') 


(4.31) 
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The amplitude will depend on the strength of the kick. A constant force F applied for a very short time, At' , will change 
the momentum of the mass by mAv x = F At’ . If this time interval is short enough the mass doesn't have a chance to 
move very far before the force is turned off, then from that time on it's subject only to the — kx force. This kick gives 
m a velocity FAt'/m, and that’s what determines the unknown constant A. 

Just after t = t', v x = Aloq = FAt'/m. This determines A, so the position of m is 



When the external force is the sum of two terms, the total solution is the sum of the solutions for the individual 
forces. If an impulse at one time gives a solution Eq. (4.32), an impulse at a later time gives a solution that starts its 
motion at that later time. The key fact about the equation that you're trying to solve is that it is linear, so you can get 
the solution for two impulses simply by adding the two simpler solutions. 


then 


m 


d 2 (x i + x 2 ) 
dt 2 


+ k(x i + x 2 ) 


F\{t) + F 2 (t) 


wvww 


x 2 


+ 


Xi +x 2 

Wvww 
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The way to make use of this picture is to take a sequence of contiguous steps. One step follows immediately after 
the preceding one. If two such impulses are two steps 


Fn 


__ I F(t 0 ) (t 0 < t < h) 

10 


and Ft = { (ti<t<t 2 ) 

1 0 (elsewhere) 


(elsewhere) 

rnx + kx = F 0 + Fi (4.33) 

then if Xo is the solution to Eq. (4.30) with only the Fq on its right, and X\ is the solution with only F\ , then the full 
solution to Eq. (4.33) is the sum, xq + X\. 

Think of a general forcing function F x , ext (t) in the way that you would set up an integral. Approximate it as a 
sequence of very short steps as in the picture. Between t k and t k+l the force is essentially F(t k ). The response of m 
to this piece of the total force is then Eq. (4.32). 


x k (t) = { F ™f k sin M* ~ f k)) > tk) 
10 (t<t k ) 


where At k = t k+1 — t k . 



+ 


+ 


To complete this idea, the external force is the sum of a lot of terms, the force between t\ and t 2 , that between 
f 2 and etc. The total response is the sum of all these individual responses. 


x{t) = 


k 


k sin (^0 (t - t k )) (t > t k ) 


0 


(t < tk) 


For a specified time t, only the times t k before and up to t contribute to this sum. The impulses occurring at the times 
after the time t can't change the value of x{t)\ they haven’t happened yet. In the limit that A t k — > 0, this sum becomes 
an integral. 

x(t) = [ dt' sin (cuo(t — t ')) 
l-oo muj o 


(4.34) 


4 — Differential Equations 


105 


Apply this to an example. The simplest is to start at rest and begin applying a constant force from time zero on. 


Fextit) 


F 0 (t> 0) 

0 (t < 0) 


and the last expression applies only for t > 0. It is 


x(t) 



F 0 

mojQ 


sin 


(u 0 (t-t')) 


x(t) 


mu ; 


[l — cos(tuof)] 


(4.35) 


As a check for the plausibility of this result, look at the special case of small times. Use the power series expansion of 
the cosine, keeping a couple of terms, to get 




Fq u 2 t 2 _ F 0 t 2 
mul 2 m2 


and this is just the result you'd get for constant acceleration Fo/m. In this short time, the position hasn't changed 
much from zero, so the spring hasn’t had a chance to stretch very far, so it can't apply much force, and you have nearly 
constant acceleration. 

This is a sufficiently important subject that it will be repeated elsewhere in this text. A completely different 
approach to Green's functions will appear is in section 15.5, and chapter 17 is largely devoted to the subject. 

4.7 Separation of Variables 

If you have a first order differential equation — I'll be more specific for an example, in terms of x and t — and if you 
are able to move the variables around until everything involving x and dx is on one side of the equation and everything 
involving t and dt is on the other side, then you have “separated variables.” Now all you have to do is integrate. 

For example, the total energy in the undamped harmonic oscillator is E = mv 2 / 2 + kx 2 / 2. Solve for dx/ dt and 


Tt = //d E - kx2 ' 2 ) 

To separate variables, multiply by dt and divide by the right-hand side. 




(4.36) 
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Now it’s just manipulation to put this into a convenient form to integrate. 


dx 


k y/(2E/k) - x 2 


= dt, 


dx 


or 


y/(2E/k) - X 2 


-dt 

m 


Make the substitution x = asin0 and you see that if a 2 = 2 E/k then the integral on the left simplifies. 

f a cos 9 dO 


— , . , — dt so 9 = sin — = Uq t + C 

- a\/l — sin 2 9 J V rn a 

or x(t) = asm(uot + C) where uq = \Jk/m 

An electric circuit with an inductor, a resistor, and a battery has a differential equation for the current flow: 




+ IR 


Vo 


Manipulate this into 

Now integrate this to get 


dl 


dl 


L dt= v °~ IR ’ the " L v^m =it 

(U =t + C, or - 4 ln(Vo - IR) = t + C 

H 


Vo — I R 


(4.37) 


Solve for the current / to get 

RI(jt) = V 0 - e-WQ+C) (4.38) 

Now does this make sense? Look at the dimensions and you see that it doesn't, at least not yet. The problem is the 
logarithm on the preceding line where you see that its units don’t make sense either. How can this be? The differential 
equation that you started with is correct, so how did the units get messed up? It goes back to the standard equation for 
integration, 

J dx/x = In x + C 
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If x is a length for example, then the left side is dimensionless, but this right side is the logarithm of a length. It's a 
peculiarity of the logarithm that leads to this anomaly. You can write the constant of integration as C = — In 77 where 
C is another arbitrary constant, then 


J dx/x = lnx + C = In x — In C = In ^ 

If C' is a length this is perfectly sensible dimensionally. To see that the dimensions in Eq. (4.38) will work themselves 
out (this time), put on some initial conditions. Set 7(0) = 0 so that the circuit starts with zero current. 

R ■ 0 = V 0 - e-^Xo+C) implies e ~(L/R)(C) = Vq 
RI( t ) = V 0 - V 0 e~ Lt / R or lit ) = (1 - e~ Lt / R )V 0 /R 


and somehow the units have worked themselves out. Logarithms do this, but you still better check. The current in the 
circuit starts at zero and climbs gradually to its final value I = Vo/77. 


4.8 Circuits 

The methods of section 4.1 apply to simple linear circuits, and the use of complex algebra as in that section leads to 
powerful and simple ways to manipulate such circuit equations. You probably remember the result of putting two resistors 
in series or in parallel, but what about combinations of capacitors or inductors under the same circumstances? And what 
if you have some of each? With the right tools, all of these questions become the same question, so it’s not several 
different techniques, but one. 

If you have an oscillating voltage source (a wall plug), and you apply it to a resistor or to a capacitor or to an 
inductor, what happens? In the first case, V = IR of course, but what about the others? The voltage equation for a 
capacitor is V = cj/C and for an inductor it is V = Ldl /dt. A voltage that oscillates at frequency to is V = Vo cos oot, 
but using this trigonometric function forgoes all the advantages that complex exponentials provide. Instead, assume that 
your voltage source is V = Voe luJt with the real part understood. Carry this exponential through the calculation, and 
take the real part only at the end — often you won’t even need to do that. 


Voe 




V 0 e iut 



0 


y oe «JT^ 0 
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These are respectively 

V 0 e iujt — IR = I 0 e iuJt R 

V 0 e iujt = q/C => icoV 0 e iujt = q/C = I/C = I 0 e iu}t /C 
V 0 e iuJt = Li = iuLI = iuLI 0 e iu}t 

In each case the exponential factor is in common, and you can cancel it. These equations are then 

V = IR V = I/iuC V = iuL I 

All three of these have the same form: V = (something times)/, and in each case the size of the current is proportional 
to the applied voltage. The factors of i implies that in the second and third cases the current is ±90° out of phase with 
the voltage cycle. 

The coefficients in these equations generalize the concept of resistance, and they are called “impedance,” respec- 
tively resistive impedance, capacitive impedance, and inductive impedance. 

V = Z R I = RI V = Z C I = -^1 V = Z L I = iuLI (4.39) 

Impedance appears in the same place as does resistance in the direct current situation, and this implies that it can be 
manipulated in the same way. The left figure shows two impedances in series. 



The total voltage from left to right in the left picture is 

V = ZJ + Z 2 I = (Z 1 + Z 2 )I = Z total I (4.40) 

It doesn’t matter if what's inside the box is a resistor or some more complicated impedance, it matters only that each 
box obeys V = ZI and that the total voltage from left to right is the sum of the two voltages. Impedances in series 
add. You don't need the common factor e lujt . 

For the second picture, for which the components are in parallel, the voltage is the same on each impedance and 
charge is conserved, so the current entering the circuit obeys 

V V V 111 

I-I 1 + I 2 , then — = — + — or — = — + — (4.41) 

^total ^1 ^total ^1 ^2 
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Impedances in parallel add as reciprocals, so both of these formulas generalize the common equations for resistors in 
series and parallel. They also include as a special case the formula you may have seen before for adding capacitors in 
series and parallel. 

In the example Eq. (4.37), if you replace the constant voltage by an oscillating voltage, you have two impedances 
in series. 

Ztot = Zr + Zi = R + iojL I = V/(R + iuiL ) 


What happened to the e Lt / R term of the previous solution? This impedance manipulation tells you the inhomogeneous 
solution; you still must solve the homogeneous part of the differential equation and add that. 

L^- + IR = 0 =► I(t) = Ae~ Rt ! L 



The total solution is the sum 


I(t) = Ae~ Rt / L + V 0 e iuJt 


1 


R + iouL 


real part 


_ A.-Rt/L , T , cos M-0) 


s/R 2 + co 2 L 2 


where 


= tan 


-l 


ojL 

~R 


(4.42) 


How did that last manipulation come about? Change the complex number R + iuL in the denominator from rectangular 
to polar form. Then the division of the complex numbers becomes easy. The dying exponential is called the “transient” 
term, and the other term is the “steady-state” term. 

The denominator is 


R + icuL = a + i/3 = J a 2 + /3 2 


(4.43) 


The reason for this multiplication and division by the same factor is that it makes the 
final fraction have magnitude one. That allows me to write it as an exponential, e*^. 
From the picture, the cosine and the sine of the angle (j) are the two terms in the fraction. 


p 




a 


+ £ 2 


a + i/3 = \J a 2 + /3 2 (cos <f> + i sin 0) = \J a 2 + f3 2 e 1 ^ 


and 


tan (/) = /3 /a 
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In summary, 


V = IZ 


i--'- 


Z = \Z\e l 


I = 


14 


VR 2 + OJ 2 L 2 e i( f> 

To satisfy initial conditions, you need the parameter A, but you also see that it gives a dying exponential. After 
some time this transient term will be negligible, and only the oscillating steady-state term is left. That is what this 
impedance idea provides. 

In even the simplest circuits such as these, that fact that Z is complex implies that the applied voltage is out of 
phase with the current. Z = \Z\e i< t > , so I = V/Z has a phase change of — c /> from V. 

What if you have more than one voltage source, perhaps the second having a different frequency from the first? 
Remember that you're just solving an inhomogeneous differential equation, and you are using the methods of section 
4.2. If the external force in Eq. (4.12) has two terms, you can handle them separately then add the results. 


4.9 Simultaneous Equations 

What's this doing in a chapter on differential equations? Patience. Solve two equations in two unknowns: 

(X) ax + by — e adx + bdy — box — bdy = ed — fb 

(Y) cx + dy = f (ad — bc)x = ed — fb 

Similarly, multiply (Y) by a and (X) by c and subtract: 


acx + ady — acx — cby = fa — ec 
(ad — bc)y = fa — ec 


Divide by the factor on the left side and you have 

ed — fb 


x = 


V = 


fa — ec 


(4.44) 


ad — be' ad — be 

provided that ad — be A 0- This expression appearing in both denominators is the determinant of the equations. 

Classify all the essentially different cases that can occur with this simple-looking set of equations and draw graphs 
to illustrate them. If this looks like problem 1.23, it should. 
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1. The solution is just as in Eq. (4.44) above and nothing goes wrong. There is exactly one solution. The two 
graphs of the two equations are two intersecting straight lines. 

2. The denominator, the determinant, is zero and the numerator isn't. This is impossible and there are no 
solutions. When the determinant vanishes, the two straight lines are parallel and the fact that the numerator isn’t zero 
implies that the two lines are distinct and never intersect. (This could also happen if in one of the equations, say (X), 
a = b = 0 and e yf 0. For example 0 = 1. This obviously makes no sense.) 

3a. The determinant is zero and so are both numerators. In this case the two lines are not only parallel, they are 
the same line. The two equations are not really independent and you have an infinite number of solutions. 

3b. You can get zero over zero another way. Both equations (X) and (Y) are 0 = 0. This sounds trivial, but it 
can really happen. Every x and y will satisfy the equation. 

4. Not strictly a different case, but sufficiently important to discuss it separately: suppose that the right-hand 
sides of (X) and (Y) are zero, e = / = 0. If the determinant is non-zero, there is a unique solution and it is x = 0, 

y = o. 

5. With e = / = 0, if the determinant is zero, the two equations are the same equation and there are an infinite 
number of non-zero solutions. 

In the important case for which e = / = 0 and the determinant is zero, there are two cases: (3b) and (5). In 
the latter case there is a one-parameter family of solutions and in the former case there is a two-parameter family. Put 
another way, for case (5) the set of all solutions is a straight line, a one-dimensional set. For case (3b) the set of all 
solutions is the whole plane, a two-dimensional set. 



Example: consider the two equations 



kx + (k - 1 )y = 0, (1 - k)x + (k - 1 ) 2 y = 0 


For whatever reason, I would like to get a non-zero solution for x and y. Can I? The condition depends on the determinant, 
so take the determinant and set it equal to zero. 

k(k - l) 2 - (1 - k)(k - 1) = 0, 


or (fc + l)(fc- l) 2 = 0 
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There are two roots, k = — 1 and k = +1. In the k = — 1 case the two equations become 

—x - 2y = 0, and 2x + Ay = 0 


The second is just —2 times the first, so it isn't a separate equation. The family of solutions is all those x and y satisfying 
x = —2 y, a straight line. 

In the k = +1 case you have 

x + Oy = 0, and 0 = 0 

The solution to this is x = 0 and y = anything and it is again a straight line (the y- axis). 


4.10 Simultaneous ODE’s 

Single point masses are an idealization that has some application to the real world, but there are many more cases for 
which you need to consider the interactions among many masses. To approach this, take the first step, from one mass 
to two masses. 


k\ k 3 k 2 



u u 

Xi x 2 


Two masses are connected to a set of springs and fastened between two rigid walls as shown. The coordinates for 
the two masses (moving along a straight line) are x\ and x 2 , and I’ll pick the zero point for these coordinates to be the 
positions at which everything is at equilibrium — no total force on either. When a mass moves away from its equilibrium 
position there is a force on it. On mi, the two forces are proportional to the distance by which the two springs k\ and 
k 3 are stretched. These two distances are x\ and x\ — x 2 respectively, so F x = ma x applied to each mass gives the 
equations 


mi 


d 2 x i 
df 2 


-kixi - k 3 (x i - x 2 ), 


and 


d 2 x 2 


m 2 


dt 2 


-k 2 x 2 - k 3 (x 2 - Xi) 


(4.45) 


I’m neglecting friction simply to keep the algebra down. These are linear, constant coefficient, homogeneous equations, 
just the same sort as Eq. (4.4) except that there are two of them. What made the solution of (4.4) easy is that the 
derivative of an exponential is an exponential, so that when you substituted x(t) = Ae at all that you were left with was 
an algebraic factor — a quadratic equation in a. Exactly the same method works here. 

The only way to find out if this is true is to try it. The big difference is that there are two unknowns instead of 
one, and the amplitude of the two motions will probably not be the same. If one mass is a lot bigger than the other, you 
expect it to move less. 
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Try the solution 

X\(t) = Ae at , x 2 (t) = Be at (4.46) 

When you plug this into the differential equations for the masses, all the factors of e at cancel, just the way it happens 
in the one variable case. 


m\a 2 A = — k\A — k 3 (A — B), and m 2 a 2 B = —k 2 B — k 3 {B — A) (4.47) 

Rearrange these to put them into a neater form. 

{ki + k 3 + mia 2 )A + (-k 3 )B = 0 

( - k 3 ) A + {k 2 + k 3 + m 2 a 2 ) B = 0 (4.48) 

The results of problem 1.23 and of section 4.9 tell you all about such equations. In particular, for the pair of 
equations ax + by = 0 and cx + dy = 0, the only way to have a non-zero solution for x and y is for the determinant 
of the coefficients to be zero: ad — be = 0. Apply this result to the problem at hand. Either A = 0 and B = 0 with a 
trivial solution or the determinant is zero. 

{k\ + k 3 + miQ 2 ) (k 2 + k 3 + m 2 a 2 ) - ( k 3 ) 2 = 0 (4.49) 

This is a quadratic equation for a 2 , and it determines the frequencies of the oscillation. Note the plural in the word 
frequencies. 

Equation (4.49) is just a quadratic, but it’s still messy. For a first example, try a special, symmetric case: 
mi = m 2 = 'i'll and k\ = k 2 . There’s a lot less algebra. 

(ki + k 3 + ma 2 ) 2 — (k 3 ) 2 = 0 (4.50) 

You could use the quadratic formula on this, but why? It's already set up to be factored. 

{k\ + k 3 + ma 2 — k 3 )(ki + k 3 + ma 2 + k 3 ) = 0 


The product is zero, so one or the other factors is zero. These determine the crs. 


2 k i A 

ai = and 

m 


ct\ = - 


k\ + 2k 3 


m 


(4.51) 


These are negative, and that’s what you should expect. There's no damping and the springs provide restoring forces that 
should give oscillations. That’s just what these imaginary a's provide. 
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When you examine the equations ax + by = 0 and cx + dy = 0 the condition that the determinant vanishes is the 
condition that the two equations are really just one equation, and that the other is not independent of it; it is actually 
a multiple of the first. You must solve that equation for x and y. Here, arbitrarily pick the first of the equations (4.48) 
and find the relation between A and B. 

af = — — =>■ (k\ + k 3 + m(— (ki/m))) A + ( — k 3 )B = 0 =>- B = A 


a\ = 


k i + 2k 3 


m 


(h + k 3 + m(-{ki + 2 k 3 /m)))A + ( 


1 „ \ D 


For the first case, ot\ = ±iuj\ 
called "normal modes.” 


±.iyfk\/m, there are two solutions to the original differential equations. These are 


x\ (t) = Aie lLOlt 
x 2 (t) = Aie lU}lt 


x\ (t) = A 2 e lLOlt 
x 2 (t) = A 2 e~ luJlt 


The other frequency has the corresponding solutions 


x\{t) = A 3 e luJ2t 
x 2 {t) = -A 3 e luJ2t 


x\ (t) = A 4 e lu}2t 
x 2 (t ) = -A 4 e~ luJ2t 


The total solution to the differential equations is the sum of all four of these. 


Xl (t) = A ie iu}lt + A 2 e~ iu}lt + A 3 e iu}2t + A 4 e~ iuj2t 

x 2 (t) = Aie Wlt + A 2 e~ luJlt — A 3 e W2t — A 4 e~ lul2t (4.52) 

The two second order differential equations have four arbitrary constants in their solution. You can specify the 
initial values of two positions and of two velocities this way. As a specific example suppose that all initial velocities are 
zero and that the first mass is pushed to coordinate x 3 and released. 


£i(0) — xq — A\ + A 2 + A 3 + A 4 
2 ^ 2 ( 0 ) = 0 = A\ + A 2 — A 3 — A 4 
Tri(O) = 0 = icviAi — iujiA 2 + ico 2 A 3 — iuj 2 A 4 
v x2 (0) = 0 = iu\Ai - iuj 4 A 2 - iuj 2 A 3 + iu 2 A 4 


(4.53) 
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With a little thought (i.e. don't plunge blindly ahead) you can solve these easily. 

Ai = A 2 = A 3 = 


Xl (t) = ^ [ t e ^ + e ~ iu} + e iW2i + e"^ 2 *] = ^ [cos^i t + cos u 2 t] 
x 2 (t) = ^ [< e' iu}lt + e ~ iu} 14 - e iW2i - e"****] = ^ [cos u^t - coscu 2 t] 

From the results of problem 3.34, you can rewrite these as 


X\ (t) = Xq cos 
x 2 (t) = Xq sin 


(^0 



(4.54) 


As usual you have to draw some graphs to understand what these imply. If the center spring k 3 is a lot weaker 
than the outer ones, then Eq. (4.51) implies that the two frequencies are close to each other and so \oj\ —u 2 \ <S OJ\ +oj 2 . 
Examine Eq. (4.54) and you see that one of the two oscillating factors oscillate at a much higher frequency than the 
other. To sketch the graph of x 2 for example you should draw one factor [sin ((CU 2 +tUi)f/2)] and the other factor 
[sin ((CU 2 — cui)t/2)] and graphically multiply them. 



The mass m 2 starts without motion and its oscillations gradually build up. Later they die down and build up 
again (though with reversed phase). Look at the other mass, governed by the equation for X\ (t) and you see that the 
low frequency oscillation from the (oj 2 —uj\)/2 part is big where the one for x 2 is small and vice versa. The oscillation 
energy moves back and forth from one mass to the other. 


4 — Differential Equations 


116 


4.11 Legendre’s Equation 

This equation and its solutions appear when you solve electric and gravitational potential problems in spherical coordinates 
[problem 9.20]. They appear when you study Gauss’s method of numerical integration [Eq. (11.27)] and they appear 
when you analyze orthogonal functions [problem 6.7]. Because it shows up so often it is worth the time to go through 
the details in solving it. 


[(1 - x 2 )y']' + Cy = 0, or (1 - x 2 )y" - 2xy' + Cy = 0 (4.55) 

Assume a Frobenius solutions about x = 0 

OO 

V = a kX k+s 
o 

and substitute into (4.55). Could you use an ordinary Taylor series instead? Yes, the point x = 0 is not a singular point 
at all, but it is just as easy (and more systematic and less prone to error) to use the same method in all cases. 

OO OO OO 

(1 - x 2 ) y^a k (k + s)(k + s - l)x fc+s_2 - 2 xy^a k (k + s)x k+s ~ 1 + C y^a k a k+s = 0 
o oo 

OO OO OO 

y a k (k + s)(k + s — l)x k+s ~ 2 + ^ a k [ — 2 (k + s) — (k + s)(k + s — l)]x fe+s + C a k ak+s = 0 

0 0 0 

OO OO OO 

y a n+ 2 (n + s + 2)(n + s + l)x n+s - y^ a n [{n + s) 2 + (n + s)]x n+s + C y^ a n x n+s = 0 

n =— 2 n = 0 n = 0 

In the last equation you see the usual substitution k = n + 2 for the first sum and k = n for the rest. That makes the 
exponents match across the equation. In the process, I simplified some of the algebraic expressions. 

The indicial equation comes from the n = — 2 term, which appears only once. 


aos(s — 1) = 0, so s = 0, 1 


Now set the coefficient of x n+s to zero, and solve for a n + 2 in terms of a n - Also note that s is a non-negative integer, 
which says that the solution is non-singular at x = 0, consistent with the fact that zero is a regular point of the differential 
equation. 


®n+ 2 — 


(n + s)(n + s + 1) - C 
(n + s + 2 )(n + s + 1) 


(4.56) 
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a 2 = Oq 


s(s + 1) — C 
(s + 2) (s + 1) 


then 


o 4 = Cl 2 


(s + 2)(s + 3) -<7 
(s + 4)(s + 3) 


etc. 


(4.57) 


This looks messier than it is. Notice that the only combination of indices that shows up is n + s. The index s is 0 or 1, 
and n is an even number, so the combination n + s covers the non-negative integers: 0, 1, 2, . . . 

The two solutions to the Legendre differential equation come from the two cases, s = 0, 1. 


s = 0 : 

s = 1 : 


Go 


1 + 


x + 


x 2 + 


-C 
2 

1-2 -C 

3-2 


-c 

2 


x 3 + 


2- 3-C 
4-3 

1-2 -C 

3- 2 


x 4 + 


C 

2 


2-3-C\ / 4 ■ 5 — (7 


4-3 


6-5 


x 


3-4 -C 
5-4 


x H 


(4.58) 


and the general solution is a sum of these. 

This procedure gives both solutions to the differential equation, one with even powers and one with odd powers. 
Both are infinite series and are called Legendre Functions. An important point about both of them is that they blow up 
as x — > ±1. This fact shouldn’t be too surprising, because the differential equation (4.55) has a singular point there. 


y 


n 


2 X , C 

(1 + x)(l - x)^ + (1 + x)(l — x)^ 


(4.59) 


It's a regular singular point, but it is still singular. A detailed calculation in the next section shows that these solutions 
behave as ln(l — x) near x = 1. 

There is an exception! If the constant C is for example (7 = 6, then with s = 0 the equations (4.57) are 

-6 6 — 6 

G2 = Go — , G4 = 02 =0, ae = a 8 = • ■ • = 0 

The infinite series terminates in a polynomial 


ao + ci 2 X 2 = ao[l — 3x 2 ] 

This (after a conventional rearrangement) is a Legendre Polynomial, 

D , , 3 , 1 

P‘2(x) = -x 2 - - 
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The numerator in Eq. (4.56) for a n+ 2 is [(n + s)(n + s + 1 ) — C], If this happen to equal zero for some value of 

n = N , then ajy+ 2 = 0 and so then all the rest of aj\ r +4 . . . are zero too. The series is a polynomial. This will happen 

only for special values of C , such as the value C = 6 above. The values of C that have this special property are 

C = £(£ + 1), for £ = 0, 1, 2, ... (4.60) 

This may be easier to see in the explicit representation, Eq. (4.58). When a numerator equals zero, all the rest that 
follow are zero too. When C = £(£ + 1) for even £, the first series terminates in a polynomial. Similarly for odd £ the 
second series is a polynomial. These are the Legendre polynomials, denoted Pi(x), and the conventional normalization 
is to require that their value at x = 1 is one. 

P 0 {x) = 1 P 1 (x) = x P 2 (x) = lx 2 -l 

P 3 (x) = | X 3 - \ X Pa{x) = ^ x 4 - Y x2 + | 


The special case for which the series terminates in a polynomial is by far the most commonly used solution to Legendre's 
equation. You seldom encounter the general solutions as in Eq. (4.58). 

A few properties of the Pi are 


(a) / dxP n {x)Pm(x) = - — —bum where 5 n m 

( b ) (n + l)P n+ i(x) = (2 n + 1 )xP n (x) - nP„_i(i) 

(—^ n H n 

(d) Pn( 1) = 1 Pn(~x) = (-1 )"P„(x) 

OO 

(e) (l - 2 tx + t 2 ) ^ 1 = Yt n P n {x) 

n = 0 


1 if n = m 

0 if n yf m 


(4.62) 


4.12 Asymptotic Behavior 

This is a slightly technical subject, but it will come up occasionally in electromagnetism when you dig into the details 
of boundary value problems. It will come up in quantum mechanics when you solve some of the standard eigenvalue 
problems that you face near the beginning of the subject. If you haven’t come to these yet then you can skip this part 
for now. 
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You solve a differential equation using a Frobenius series and now you need to know something about the solution. 
In particular, how does the solution behave for large values of the argument? All you have in front of you is an infinite 
series, and it isn't obvious how it will behave far away from the origin. In the line just after Eq. (4.59) it says that these 
Legendre functions behave as ln(l —x). How can you tell this from the series in Eq. (4.58)? 

There is a theorem that addresses this. Take two functions described by two series: 

OO OO 

f(x) = a k' xk and 9(x) = b k xk 


It does not matter where the sums start because you are concerned just with the large values of k. The lower limit could 
as easily be —14 or +27 with no change in the result. The ratio test, Eq. (2.8), will determine the radius of convergence 
of these series, and 


a k+i xk+1 

a k x>C 


<C< 1 


for large enough k 


is enough to insure convergence. The largest x for which this holds defines the radius of convergence, maybe 1, maybe 
oo. . . . Call it R. 


Assume that (after some value of k ) all the a ^ and b ^ are positive, then look at the ratio of the ratios, 


a k+l/ a k 

bk+i/bk 


If this approaches one, that will tell you only that the radii of convergence of the two series are the same. If it approaches 
one very fast, and if either one of the functions goes to infinity as x approaches the radius of convergence, then it says 
that the asymptotic behaviors of the functions defined by the series are the same. 


If CLk + l /^k _ i ^ q as as J_ anc | jf gj t [ ler Q,. g 


oo 


as x — > R 


Then 


f(x) 

g(x) 


a constant as x — > R 


There are more general ways to state this, but this handles most cases of interest. 
Compare these series near x = 1. 

OO OO fc 

^2x k , or ln(l-z) 


1 


1 — x 


0 

OO 


or 


(1 - x )- 1/2 = 22 — Z 11 ' 'jr - - + X> (-+ (a = -1/2) 


k = 0 
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Even in the third case, the signs of the terms are the same after a while, so this is relevant to the current discussion. 
The ratio of ratios for the first and second series is 


a k+l/ a k 

bk+i/bk 


1 _ 1 _ 1 1 , 

{k+l)/k ~ 1 + l/k ~ l ~k + 


These series behave differently as x approaches the radius of convergence ( x -4 1). But you knew that. The point is to 
compare an unknown series to a known one. 

Applying this theorem requires some fussy attention to detail. You must make sure that the indices in one series 
correspond exactly to the indices in the other. Take the Legendre series, Eq. (4.56) and compare it to a logarithmic 
series. Choose s = 0 to be specific; then only even powers of x appear. That means that I want to compare it to a 
series with even powers, and with radius of convergence = 1. First try a binomial series such as for (1 - x 2 ) a , but that 
doesn't work. See for yourself. The logarithm ln(l - x 2 ) turns out to be right. From Eq. (4.56) and from the logarithm 
series, 


OO 

/ 0) = XI anxU 

n even 


with 


®n+ 2 


(n + s)(n + s + 1) — C 
{n + s + 2 )(n + s + 1) 


9(x) 


ln(l — x 2 ) 


E 

l 


x 2 k 

~k 


X b k x2k 


To make the indices match, let n = 2k in the second series. 


9 (z) = X 

n even 



n / 2 


x^ n 


Now look at the ratios. 

a n + 2 _ n(n + 1) — C _ 1 + ^ ~ ^2 _ i _ 2^ 

a n ~ (n + 2)(n + l) “ 1 + | + ^ _ n 

c n + 2 _ 2 /(re + 2) _ n _ 1 _ 1 _ 2 

c n 2 jn n + 2 1 + \ n 

These agree to order 1 jn, so the ratio of the ratios differs from one only in order 1 /n 2 , satisfying the requirements 
of the test. This says that the Legendre functions (the ones where the series does not terminate in a polynomial) are 

logarithmically infinite near x = ±1. It's a mild infinity, but it is still an infinity. Is this bad? Not by itself, after all the 


4 — Differential Equations 


121 


electric potential of a line charge has a logarithm of r in it. Singular solutions aren’t necessarily wrong, it just means 
that you have to look closely at how you are using them. 


Exercises 

1 What algebraic equations do you have to solve to find the solutions of these differential equations? 

d 10 z 


d 3 x dx , 

~ttt + a-j- +DX = 0, 
dt 6 dt 


du w 


-3z = 0 


2 These equations are separable, as in section 4.7. Separate them and integrate, solving for the dependent variable, 
with one arbitrary constant. 

dN . .. dx o rx dvx / Rv \ 

~dT = ~ XN ' M= a +X ’ IT = -“(l-e "■) 


dt 


dt 


3 From Eq. (4.40) and (4.41) what are the formulas for putting capacitors or inductors in series and parallel? 
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Problems 

4.1 If the equilibrium position x = 0 for Eq. (4.4) is unstable instead of stable, this reverses the sign in front of k. Solve 
the problem that led to Eq. (4.10) under these circumstances. That is, the damping constant is b as before, and the 
initial conditions are x(0) = 0 and v x (0 ) = Vq. What is the small time and what is the large time behavior? 

Ans: (: 2mvo/y/b 2 + Akrn)e~ bt /‘ lrn sinh (^/b 2 /Am + k/mt 

4.2 In the damped harmonic oscillator problem, Eq. (4.4), suppose that the damping term is an anti - damping term. It 
has the sign opposite to the one that I used ( +bdx/dt ). Solve the problem with the initial condition x(0) = 0 and 
t; x (0) = Vq and describe the resulting behavior. 

Ans: (2mvo / V 4 km — b 2 ) e bt / 2m sin (\/ 4 km — b 2 t/ 2m) 

4.3 A point mass m moves in one dimension under the influence of a force F x that has a potential energy V ( x ). Recall 
that the relation between these is 

f. = -¥ 

dx 

Take the specific potential energy V (x) = —Vq a 2 /(a 2 + x 2 ), where Vo is positive. Sketch V. Write the equation 
F x = ma x ■ There is an equilibrium point at x = 0, and if the motion is over only small distances you can do a power 
series expansion of F x about x = 0. What is the differential equation now? Keep just the lowest order non-vanishing 
term in the expansion for the force and solve that equation subject to the initial conditions that at time t = 0, x(0) = Xq 
and v x (0) = 0. 

How does the graph of V change as you vary a from small to large values and how does this same change in a affect 
the behavior of your solution? Ans: oj = \j2Voj ma 2 

4.4 The same as the preceding problem except that the potential energy function is +Vo a 2 / (a 2 + x 2 ). Ans: x(t) = 
Xo cosh (^yZVoJmcF t) (|x| < a/4 or so, depending on the accuracy you want.) 

4.5 For the case of the undamped harmonic oscillator and the force Eq. (4.13), start from the beginning and derive 
the solution subject to the initial conditions that the initial position is zero and the initial velocity is zero. At the end, 
compare your result to the result of Eq. (4.15) to see if they agree where they should agree. 

4.6 Check the dimensions in the result for the forced oscillator, Eq. (4.15). 

4.7 Fill in the missing steps in the derivation of Eq. (4.15). 
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4.8 For the undamped harmonic oscillator apply an extra oscillating force so that the equation to solve is 


m -^2 = ~kx + F ext (t) 


where the external force is F ext (t) = Focosut. Assume that u ^ Uq = \ fkfm . 

Find the general solution to the homogeneous part of this problem. 

Find a solution for the inhomogeneous case. You can readily guess what sort of function will give you a coscut from a 
combination of x and its second derivative. 

Add these and apply the initial conditions that at time t = 0 the mass is at rest at the origin. Be sure to check your 
results for plausibility: 0) dimensions; 1) u = 0; 2) ui — > oo; 3) t small (not zero). In each case explain why the result 
is as it should be. 

Ans: (Fo/m)[— coscuof + coscut]/(cuQ — to 2 ) 


4.9 In the preceding problem I specified that to yf cuo = \fkfm. Having solved it, you know why this condition is 
needed. Now take the final result of that problem, including the initial conditions, and take the limit as cu — > loq. [What 
is the definition of a derivative?] You did draw a graph of your result didn't you? Ans: (Fo/2mcuo)t sincuo? 


4.10 Show explicitly that you can write the solution Eq. (4.7) in any of several equivalent ways, 

Ae l0J ° t + Be~ lU}ot = C coscuo t + D sincuo? = E cos(cuo? + 0) 
l.e. , given A and B, what are C and D , what are E and 0? Are there any restrictions in any of these cases? 


4.11 In the damped harmonic oscillator, you can have the special (critical damping) case for which b 2 = 4 km and for 
which to' = 0. Use a series expansion to take the limit of Eq. (4.10) as c o' 0. Also graph this solution. What would 
happen if you took the same limit in Eqs. (4.8) and (4.9), before using the initial conditions? 

4.12 (a) In the limiting solution for the forced oscillator, Eq. (4.16), what is the nature of the result for small time? 
Expand the solution through order t 2 and understand what you get. Be careful to be consistent in keeping terms to the 
same order in t. 

(b) Part (a) involved letting 0 be very large, then examining the behavior for small t. Now reverse the order: What is 
the first non-vanishing order in t that you will get if you go back to Eq. (4.13), expand that to first non-vanishing order 
in time, use that for the external force in Eq. (4.12), and find x(t) for small t. Recall that in this example x(0) = 0 and 
x(0) = 0, so you can solve for x(0) and then for x(0). The two behaviors are very different. 
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4.13 The undamped harmonic oscillator equation is d 2 x/dt 2 +U) 2 x = 0. Solve this by Frobenius series expansion about 
t = 0. 

4.14 Check the algebra in the derivation of the n = 0 Bessel equation. Explicitly verify that the general expression for 
a 2 k in terms of ao is correct, Eq. (4.22). 

4.15 Work out the Frobenius series solution to the Bessel equation for the n = '/a, s = - 1 /. 2 case. Graph both solutions, 
this one and Eq. (4.23). 

4.16 Derive the Frobenius series solution to the Bessel equation for the value of n = 1. Show that this method doesn't 
yield a second solution for this case either. 

4.17 Try using a Frobenius series method on y" + y/x 3 = 0 around x = 0. 

4.18 Solve by Frobenius series x 2 u" + 4 xv! + ( x 2 + 2 )u = 0. You should be able to recognize the resulting series (after 
a little manipulation). 

4.19 The harmonic oscillator equation, d 2 y/dx 2 + k 2 y = 0, is easy in terms of the variable x. What is this equation 
if you change variables to z = l/x, getting an equation in such things as d 2 y/dz 2 . What sort of singularity does this 
equation have at z = 0? And of course, write down the answer for y(z) to see what this sort of singularity can lead to. 
Graph it. 

4.20 Solve by Frobenius series solution about x = 0: y" + xy = 0. 

Ans: 1 - (x 3 /3!) + (1 ■ 4x 6 /6!) - (1 ■ 4 ■ 7x 9 /9!) + • • • is one.’ 

4.21 From the differential equation d 2 u/dx 2 = —u, finish the derivation for d as in Eq. (4.29). Derive identities for 
the functions c(x + y) and s(x + y). 

4.22 The chain rule lets you take the derivative of the composition of two functions. The function inverse to s is 
the function / that satisfies f(s(x)) = x. Differentiate this equation with respect to x and derive that / satisfies 
df(x)/dx = 1/Vl - x 2 . What is the derivative of the function inverse to c? 

4.23 For the differential equation u" = +u (note the sign change) use the same boundary conditions for two independent 
solutions that I used in Eq. (4.28). For this new example evaluate d and s'. Does c 2 + s 2 have the nice property that it 
did in section 4.5? What about c 2 — s 2 ? What are c(x + y) and s(x + 1/)? What is the derivative of the function inverse 
to s? to c? 
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4.24 Apply the Green’s function method for the force F^il — e on the harmonic oscillator without damping. Verify 
that it agrees with the previously derived result, Eq. (4.15). They should match in a special case. 

4.25 An undamped harmonic oscillator with natural frequency ujq is at rest for time t < 0. Starting at time zero there 
is an added force Fosinuot. Use Green's functions to find the motion for time t > 0, and analyze the solution for 
both small and large time, determining if your results make sense. Compare the solution to problems 4.9 and 4.11. 
Ans: ( F 0 /2mojQ ) [sin(u;o£) — oJot cos(cuot)] 

4.26 Derive the Green’s function analogous to Eq. (4.32) for the case that the harmonic oscillator is damped. 

4.27 Radioactive processes have the property that the rate of decay of nuclei is proportional to the number of nuclei 
present. That translates into the differential equation dN/dt = —A N, where A is a constant depending on the nucleus. 
At time t = 0 there are Nq nuclei; how many are present at time t later? The half-life is the time in which one-half of 
the nuclei decay; what is that in terms of A? Ans: In2/A 

4.28 (a) In the preceding problem, suppose that the result of the decay is another nucleus (the “daughter") that is 
itself radioactive with its own decay constant A 2 . Call the first one above Ai. Write the differential equation for the 
time-derivative of the number, N 2 of this nucleus. You note that N 2 will change for two reasons, so in time dt the 
quantity dN 2 has two contributions, one is the decrease because of the radioactivity of the daughter, the other an 
increase due to the decay of the parent. Set up the differential equation for N 2 and you will be able to use the result of 
the previous problem as input to this; then solve the resulting differential equation for the number of daughter nuclei as 
a function of time. Assume that you started with none, N 2 (0) = 0. 

(b) Next, the “activity” is the total number of all types of decays per time. Compute the activity and graph it. For the 
plot, assume that Ai is substantially smaller than A 2 and plot the total activity as a function of time. Then examine the 
reverse case, Ai S> A 2 

Ans: JV 0 Ai [(2A 2 - - \ 2 e~ Xat ] /(A 2 - Ai) 

4.29 The “snowplow problem” was made famous by Ralph Agnew: A snowplow starts at 12:00 Noon in a heavy and 
steady snowstorm. In the first hour it goes 2 miles; in the second hour it goes 1 mile. When did the snowstorm start? 
Ans: 11:23 

4.30 Verify that the equations (4.52) really do satisfy the original differential equations. 

4.31 When you use the “dry friction" model Eq. (4.2) for the harmonic oscillator, you can solve the problem by dividing 
it into pieces. Start at time t = 0 and position x = Xq (positive). The initial velocity of the mass m is zero. As 
the mass is pulled to the left, set up the differential equation and solve it up to the point at which it comes to a halt. 
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Where is that? You can take that as a new initial condition and solve the problem as the mass is pulled to the right 
until it stops. Where is that? Then keep repeating the process. Instead or further repetition, examine the case for 
which the coefficient of kinetic friction is small, and determine to lowest order in the coefficient of friction what is the 
change in the amplitude of oscillation up to the first stop. From that, predict what the amplitude will be after the 
mass has swung back to the original side and come to its second stop. In this small /i^ approximation, how many 
oscillations will it undergo until all motion stops. Let b = /ik-^jv Ans: Let t n = nn/uo, then for t n < t < t n+ 1 , 
x(t ) = [xo — (2 n + 1 )b/k\ costuo? + (— 1 ) n b/k. Stops when t « nkxo/2uJob roughly. 

4.32 A mass m is in an undamped one-dimensional harmonic oscillator and is at rest. A constant external force Fq is 
applied for the time interval T and is then turned off. What is the motion of the oscillator as a function of time for all 
t > 0? For what value of T is the amplitude of the motion a maximum after the force is turned off? For what values is 
the motion a minimum? Of course you need an explanation of why you should have been able to anticipate these two 
results. 


4.33 Starting from the solution Eq. (4.52) assume the initial conditions that both masses start from the equilibrium 
position and that the first mass is given an initial velocity v x \ = Vq. Find the subsequent motion of the two masses and 
analyze it. 

4.34 If there is viscous damping on the middle spring of Eqs. (4.45) so that each mass feels an extra force depending 
on their relative velocity, then these equations will be 


mi 


d 2 x i 

~W 


— kiXi — k 3 {x\ — £ 2 ) — b(xi — ± 2 ), and 


m 2 


d 2 x 2 
dt 2 


-k 2 x 2 - k 3 (x 2 - xi) - b(x 2 - ± 1 ) 


Solve these subject to the conditions that all initial velocities are zero and that the first mass is pushed to coordinate Xq 
and released. Use the same assumption as before that m\ = m 2 = m and k\ = k 2 . 

4.35 For the damped harmonic oscillator apply an extra oscillating force so that the equation to solve is 


d 2 x , dx 

= - b di 


kx + F ext (t) 


where the external force is F ext (t) = Foe lbjt . 

(a) Find the general solution to the homogeneous part of this problem. 
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(b) Find a solution for the inhomogeneous case. You can readily guess what sort of function will give you an e XUJt from 
a combination of a; and its first two derivatives. 

This problem is easier to solve than the one using cost dt, and at the end, to get the solution for the cosine case, all you 
have to do is to take the real part of your result. 

4.36 You can solve the circuit equation Eq. (4.37) more than one way. Solve it by the methods used earlier in this 
chapter. 


4.37 For a second order differential equation you can pick the position and the velocity any way that you want, and the 
equation then determines the acceleration. Differentiate the equation and you find that the third derivative is determined 
too. 


d 2 x b dx k 

dt 2 m dt m 


implies 


d 3 x b d 2 x k dx 

dt 3 m dt 2 m dt 


Assume the initial position is zero, x(0) = 0 and the initial velocity is u x (0) = Vq\ determine the second derivative at 
time zero. Now determine the third derivative at time zero. Now differentiate the above equation again and determine 
the fourth derivative at time zero. 

From this, write down the first five terms of the power series expansion of x{t) about t = 0. 

Compare this result to the power series expansion of Eq. (4.10) to this order. 


4.38 Use the techniques of section 4.6, start from the equation md 2 x/dt 2 = F x (t ) with no spring force or damping. 

(a) Find the Green's function for this problem, that is, what is the response of the mass to a small kick over a small 
time interval (the analog of Eq. (4.32))? Develop the analog of Eq. (4.34) for this case. Apply your result to the special 
case that F x (t) = Fq, a constant for time t > 0. 

(b) You know that the solution of this differential equation involves two integrals of F x (t) with respect to time, so 
how can this single integral do the same thing? Differentiate this Green's function integral (for arbitrary F x ) twice with 
respect to time to verify that it really gives what it’s supposed to. This is a special case of some general results, problems 
15.19 and 15.20. 

Ans: Wil-oc^' F x(t')(t-t') 

4.39 A point mass m moves in one dimension under the influence of a force F x that has a potential energy V (x). Recall 
that the relation between these is F x = —dV/dx, and take the specific potential energy V {x) = — Voe~ x /° , where 
Vo is positive. Sketch V . Write the equation F x = ma x . There is an equilibrium point at x = 0, and if the motion 
is over only small distances you can do a power series expansion of F x about x = 0. What is the differential equation 
now? Keep just the lowest order non-vanishing term in the expansion for the force and solve that equation subject to 
the initial conditions that at time t = 0, x(0) = Xq and ^(0) = 0. As usual, analyze large and small a. 
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4.40 Solve by Frobenius series methods 

d 2 y 2 dy 1 
dx 2 + x dx + 

Ans: is one - 

4.41 Find a series solution about x = 0 for y" + ysecx = 0, at least to a few terms. 

Ans: do [l — \x 2 + Ox 4 + H ] + d\ \x — gX 3 — ^x 5 H ] 

4.42 Fill in the missing steps in the equations (4.55) to Eq. (4.58). 

4.43 Verify the orthogonality relation Eq. (4.62)(a) for Legendre polynomials of order £ = 0, 1, 2, 3. 

4.44 Start with the function (l — 2 xt + t 2 ) . Use the binomial expansion and collect terms to get a power series 

in t. The coefficients in this series are functions of x. Carry this out at least to the coefficient of t 3 and show that the 
coefficients are Legendre polynomials. This is called the generating function for the Pi's. It is Pi{x)t ^ 

4.45 In the equation of problem 4.17, make the change of independent variable x = 1 / z. Without actually carrying out 
the solution of the resulting equation, what can you say about solving it? 

4.46 Show that Eq. (4.62)(c) has the correct value P n ( 1) = 1 for all n. Note: (1 — x 2 ) = (1 + x)(l — x) and you are 
examining the point x = 1. 

4.47 Solve for the complete solution of Eq. (4.55) for the case C = 0. For this, don’t use series methods, but get the 
closed form solution. Ans: Atai\hT l x + B 

4.48 Derive the condition in Eq. (4.60). Which values of s correspond to which values of 11 

4.49 Start with the equation y" + P(x)y' + Q{x)y = 0 and assume that you have found one solution: V = f(x). 
Perhaps you used series methods to find it. (a) Make the substitution y(x) = f(x)g(x) and deduce a differential 
equation for g. Let G = g' and solve the resulting first order equation for G. Finally write g as an integral. This is one 
method (not necessarily the best) to find the second solution to a differential equation. 

(b) Apply this result to the £ = 0 solution of Legendre's equation to find another way to solve problem 4.47. Ans: y = 

f f dx -p exp — f P dx 
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4.50 Treat the damped harmonic oscillator as a two-point boundary value problem. 


mx + bx + kx = 0, with x(0) = 0 and x(T) = d 


[For this problem, if you want to set b = k = T = d = 1 that's o.k.] 

(a) Assume that m is very small. To a first approximation neglect it and solve the problem. 

(b) Since you failed to do part (a) — it blew up in your face — solve it exactly instead and examine the solution for very 
small m. Why couldn’t you make the approximation of neglecting ml Draw graphs. Welcome to the world of boundary 
layer theory and singular perturbations. Ans: x(t) « e 1_< — e 1_ C m 

4.51 Solve the differential equation x = Ax 2 { 1 + c ot) in closed form and compare the series expansion of the result to 
Eq. (4.25). Ans: x = a/\ 1 — Aa(t + ixt 2 / 2)] 

4.52 Solve the same differential equation x = Ax 2 ( 1 +ut) with x(to) = a by doing a few iterations of Eq. (4.27). 

4.53 Analyze the steady-state part of the solution Eq. (4.42). For the input potential Voe* , find the real part of the 
current explicitly, writing the final form as J max cos(cut — 0). Plot J max and 0 versus c o. Plot V (t) and I (t) on a second 
graph with time as the axis. Recall these V and / are the real part understood. 

4.54 If you have a resistor, a capacitor, and an inductor in series with an oscillating voltage source, what is the steady- 
state solution for the current? Write the final form as / max cos(tuf — 0), and plot J max and 0 versus u. See what happens 
if you vary some of the parameters. 

Ans: / = V o cos(c ot — (j))/\Z\ where |Z| = \J R 2 + (cuL — l/c oC) 2 and tan0 = (t uL — l/t oC)/R 

4.55 In the preceding problem, what if the voltage source is a combination of DC and AC, so it is V (t) = Vq + V\ cost ot. 
Find the steady state solution now. 


4.56 


What is the total impedance left to right in the circuit 


— •SSOQUr 

u 


Ri 


R2 


Ci 


L 2 
— — 


Co 


Ans: Ri + {l/iuC 2 ) + l/ [(l/ituLi) + 1/ [(l/iuC{) + l/((l /R 2 ) + (l/icoL 2 )))] 
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4.57 Find a Frobenius series solution about x = 0 for y" + y cscx = 0, at least to a few terms. 

A n c ■ ry» 1. /y, 2 I 1 rp 3 1 /y>4 I 1 /v» 5 I ... 

rAiio. <X/ 2*^ I X2 4g*^ i - 192 ^ I 

4.58 Find a series solution about x = 0 for x 2 y" — 2ixt/ / + ( x 2 + i — l)t/ = 0. 

4.59 Just as you can derive the properties of the circular and hyperbolic trigonometric functions from the differential 
equations that they satisfy, you can do the same for the exponential. Take the equation v! = u and consider that solution 
satisfying the boundary condition u{ 0) = 1. 

(a) Prove that u satisfies the identity u(x + y) = u(x)u(y). 

(b) Prove that the function inverse to u has the derivative \/x. 

4.60 Find the asymptotic behavior of the Legendre series for the s = 1 case. 

4.61 Find Frobenius series solutions for xy" + y = 0. 



Fourier Series 

Fourier series started life as a method to solve problems about the flow of heat through ordinary materials. It has grown 
so far that if you search our library's catalog for the keyword "Fourier” you will find 618 entries as of this date. It is a 
tool in abstract analysis and electromagnetism and statistics and radio communication and .... People have even tried 
to use it to analyze the stock market. (It didn't help.) The representation of musical sounds as sums of waves of various 
frequencies is an audible example. It provides an indispensible tool in solving partial differential equations, and a later 
chapter will show some of these tools at work. 

5.1 Examples 

The power series or Taylor series is based on the idea that you can write a general function as an infinite series of powers. 
The idea of Fourier series is that you can write a function as an infinite series of sines and cosines. You can also use 
functions other than trigonometric ones, but I'll leave that generalization aside for now, except to say that Legendre 
polynomials are an important example of functions used for such more general expansions. 

An example: On the interval 0 < x < L the function x 2 varies from 0 to L 2 . It can be written as the series of 
cosines 


x 


2 


- F! v 

3 + 7T 2 ^ 


i 


(-i) r 


cos 


n z 


nnx 

~L~ 


F! 

3 



nx 1 

COS — COS 

L 4 


2nx 

~L~ 


+ -c° S 


37 TX 
~L~ 


(5.1) 


To see if this is even plausible, examine successive partial sums of the series, taking one term, then two terms, etc. 
Sketch the graphs of these partial sums to see if they start to look like the function they are supposed to represent (left 
graph). The graphs of the series, using terms up to n = 5 do pretty well at representing the parabola. 

The same function can be written in terms of sines with another series: 


2 2L 2 “ 

x 2 = 


7T 


1 


(_ l)n+i 


n 


7 r 2 n 3 


(1 -(-!)")) 


mrx 


sm 


(5.2) 


and again you can see how the series behaves by taking one to several terms of the series, (right graph) The graphs 
show the parabola y = x 2 and partial sums of the two series with terms up to n = 1, 3, 5. 
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The second form doesn't work as well as the first one, and there’s a reason for that. The sine functions all go to 
zero at x = L and x 2 doesn't, making it hard for the sum of sines to approximate the desired function. They can do it, 
but it takes a lot more terms in the series to get a satisfactory result. The series Eq. (5.1) has terms that go to zero as 
l/n 2 , while the terms in the series Eq. (5.2) go to zero only as 1 /n* 

5.2 Computing Fourier Series 

How do you determine the details of these series starting from the original function? For the Taylor series, the trick was 
to assume a series to be an infinitely long polynomial and then to evaluate it (and its successive derivatives) at a point. 
You require that all of these values match those of the desired function at that one point. That method won’t work in 
this case. (Actually I've read that it can work here too, but with a ridiculous amount of labor and some mathematically 
suspect procedures.) 

The idea of Fourier’s procedure is like one that you can use to determine the components of a vector in three 
dimensions. You write such a vector as 

A = A x x + A y y + A z z 

And then use the orthonormality of the basis vectors, x ■ y = 0 etc. Take the scalar product of the preceding equation 
with x. 

x ■ A = x ■ (A x x + A y y + A z z) = A x and y ■ A = A y and z ■ A = A z (5.3) 


* For animated sequences showing the convergence of some of these series, see 
www.physics.miami.edu/nearing/mathmethods/animations.html 
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This lets you get all the components of A. For example, 



x ■ A = A x = A cos a 
y ■ A = Ay = A cos f3 
z ■ A = A z = A cos 7 


(5.4) 


This shows the three direction cosines for the vector A. You will occasionally see these numbers used to describe vectors 
in three dimensions, and it’s easy to see that cos 2 a + cos 2 f3 + cos 2 7 = 1 . 

In order to stress the close analogy between this scalar product and what you do in Fourier series, I will introduce 
another notation for the scalar product. You don’t typically see it in introductory courses for the simple reason that it 
isn't needed there. Here however it will turn out to be very useful, and in the next chapter you will see nothing but this 
notation. Instead of x ■ A or A ■ B you use or (A, 5). The angle bracket notation will make it very easy to 

generalize the idea of a dot product to cover other things. In this notation the above equations will appear as 

(x, A) = A cos a, (y, A) = Acos /3, (5, A) = A cos 7 


and they mean exactly the same thing as Eq. (5.4). 

There are orthogonality relations similar to the ones for x, y, and z, but for sines and cosines. Let n and m 
represent integers, then 

f L , . nnrx\ . nmrx\ f 0 n A m , . 

i dx sm (-r ) sm (-r) = { L / 2 n = m (5 ' 5) 

This is sort of like x-z = 0 and y-y = 1, where the analog of x is simtx/L and the analog of y is sin 27 tx/L. 
The biggest difference is that it doesn’t stop with three vectors in the basis; it keeps on with an infinite number of 
values of n and the corresponding different sines. There are an infinite number of very different possible functions, so 
you need an infinite number of basis functions in order to express a general function as a sum of them. The integral 
Eq. (5.5) is a continuous analog of the coordinate representation of the common dot product. The sum over three 
terms A X B X + A y B y + A Z B Z becomes a sum (integral) over a continuous index, the integration variable. By using this 
integral as a generalization of the ordinary scalar product, you can say that sin(7 tx/L) and sin(27rx/L) are orthogonal. 
Let i be an index taking on the values x, y, and z, then the notation A^ is a function of the variable i. In this case the 
independent variable takes on just three possible values instead of the infinite number in Eq. (5.5). 
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Flow do you derive an identity such as Eq. (5.5)? The first method is just straight integration, using the right 
trigonometric identities. The easier (and more general) method can wait for a few pages. 

cos(x ±y) = cos x cosy =f sin x sin y, subtract: cos(x — y) — cos(;r + y) = 2 sin x sin y 


Use this in the integral. 

rL 


/ o 


, rmrx\ rrmrx\ . 

ax sin (^— j—J sm (^ — — J = I ax 


(n — m)nx\ ((n + m) ttx 
cos | ^ — cos ' 


Now do the integral, assuming n ^ m and that n and m are positive integers. 


(n — m)7i 


sm 


(n — m)nx 


[n + m) 7t 


sm 


(■ n + m) nx 


= 0 


(5.6) 


Why assume that the integers are positive? Aren't the negative integers allowed too? Yes, but they aren’t needed. Put 
n = — 1 into sm(mrx/ L) and you get the same function as for n = +1, but turned upside down. It isn’t an independent 
function, just —1 times what you already have. Using it would be sort of like using for your basis not only x, y, and z 
but —x, —y, and —z too. Do the n = m case of the integral yourself. 

Computing an Example 

For a simple example, take the function f(x) = 1, the constant on the interval 0 < x < L, and assume that there is a 
series representation for / on this interval. 


E ( n-KX \ , 

a n sin y J (0 < x < L) (5.7) 

i 

Multiply both sides by the sine of rmrx/L and integrate from 0 to L. 



(5.8) 
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Interchange the order of the sum and the integral, and the integral that shows up is the orthogonality integral derived 
just above. When you use the orthogonality of the sines, only one term in the infinite series survives. 

f L , / rmrx \ ^ f L , / rmrx\ rm rx\ 

l ‘ fasm EH 1 = E “"./ 0 dxsm \-irl sm {-ir) 


- V" . / 0 (n^m) 

~ ^[ an { L / 2 i n = m ) 

= ci m L / 2. 


(5.9) 


Now all you have to do is to evaluate the integral on the left. 


dx sin 


frmrx 
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rmr L 


rmrxi L L , 
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L J o 


rmr 


This is zero for even m, and when you equate it to (5.9) you get 


Am — 


4 


for m odd 


rmr 

You can relabel the indices so that the sum shows only odd integers m = 2k + 1 and the Fourier series is 
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7 r 
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m odd >0 


1 . mirx 
— sm — - — 
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IT 
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1 


2k + 1 


. (2k + l)7ta; 
sm = 1, 


(0 < x < L) 


(5.10) 



highest harmonic: 5 highest harmonic: 19 highest harmonic: 99 


The graphs show the sum of the series up to 2k + 1 = 5, 19, 99 respectively. It is not a very rapidly converging 
series, but it’s a start. You can see from the graphs that near the end of the interval, where the function is discontinuous, 
the series has a hard time handling the jump. The resulting overshoot is called the Gibbs phenomenon, and it is analyzed 
in section 5.7. 
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Notation 

The point of introducing that other notation for the scalar product comes right here. The same notation is used for 
these integrals. In this context define 

(f,g)= [ dx f(x)*g(x) (5.11) 

J o 

and it will behave just the same way that A ■ B does. Eq. (5.5) then becomes 



n A tn 
n = m 


where 


precisely analogous to (x,x) = 1 



and (#, z) = 0 


(5.12) 


These u n are orthogonal to each other even though they aren't normalized to one the way that x and y are, but that 
turns out not to matter. (u n ,u n ) = L/2 instead of = 1, so you simply keep track of it. (What happens to the series 
Eq. (5.7) if you multiply every u n by 2? Nothing, because the coefficients a n get multiplied by 1/2.) 

The Fourier series manipulations, Eqs. (5.7), (5.8), (5.9), become 


OO OO OO 

1 = 'y ' CinUn then { Urn j l) = (umi 'y ' UnUn^J = ^ ' Un(Umi Un) = UmiUm-, Urn) (5.13) 

1 1 n = 1 

This is far more compact than you see in the steps between Eq. (5.7) and Eq. (5.10). You still have to evaluate the 

integrals (u mi l) and (%,%), but when you master this notation you'll likely make fewer mistakes in figuring out 

what integral you have to do. Again, you can think of Eq. (5.11) as a continuous analog of the discrete sum of three 
terms, (A,B) = A X B X + A y B y + A Z B Z . 

The analogy between the vectors such as x and functions such as sine is really far deeper, and it is central to 
the subject of the next chapter. In order not to get confused by the notation, you have to distinguish between a whole 
function /, and the value of that function at a point, f(x). The former is the whole graph of the function, and the 
latter is one point of the graph, analogous to saying that A is the whole vector and A y is one of its components. 

The scalar product notation defined in Eq. (5.11) is not necessarily restricted to the interval 0 < x < L. Depending 
on context it can be over any interval that you happen to be considering at the time. In Eq. (5.11) there is a complex 
conjugation symbol. The functions here have been real, so this made no difference, but you will often deal with complex 
functions and then the fact that the notation (f,g) includes a conjugation is important. This notation is a special case 
of the general development that will come in section 6.6. The basis vectors such as x are conventionally normalized to 
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one, x-x = 1, but you don’t have to require it even there, and in the context of Fourier series it would clutter up the 
notation to require (u n ,u n ) = 1, so I don't bother. 

Some Examples 

To get used to this notation, try showing that these pairs of functions are orthogonal on the interval 0 < x < L. Sketch 
graphs of both functions in every case. 

(x, L — |x) = 0 (sin7rx/L,cos7rx/L) = 0 ( sin37rx/L, L — 2x) = 0 

The notation has a complex conjugation built into it, but these examples are all real. What if they aren't? Show that 
these are orthogonal too. How do you graph these? Not easily.* 

(i e 2inx / L , e - 2i ™/ L > = 0 (L - 1(7 + i)x, L + fix) = 0 


Extending the function 

In Equations (5.1) and (5.2) the original function was specified on the interval 0 < x < L. The two Fourier series that 
represent it can be evaluated for any x. Do they equal x 2 everywhere? No. The first series involves only cosines, so it 
is an even function of x, but it's periodic: f(x + 2 L) = f(x). The second series has only sines, so it's odd, and it too 
is periodic with period 2 L. 




Here the discontinuity in the sine series is more obvious, a fact related to its slower convergence. 


* but see if you can find a copy of the book by Jahnke and Emde, published long before computers. They show 
examples. Also check out 

www.geom.uiuc.edu/~banchoff/script/CFGInd.html or 
www.math.ksu.edu/~bennett/jomacg/ 
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5.3 Choice of Basis 

When you work with components of vectors in two or three dimensions, you will choose the basis that is most convenient 
for the problem you're working with. If you do a simple mechanics problem with a mass moving on an incline, you can 
choose a basis x and y that are arranged horizontally and vertically. OR, you can place them at an angle so that they 
point down the incline and perpendicular to it. The latter is often a simpler choice in that type of problem. 

The same applies to Fourier series. The interval on which you’re working is not necessarily from zero to L, and 
even on the interval 0 < x < L you can choose many sets of function for a basis: 

sin nnx / L (n = 1,2,...) as in equations (5.10) and (5.2), or you can choose a basis 

cosnnx/L (n = 0, 1, 2, . . .) as in Eq. (5.1), or you can choose a basis 

sin(n + !/2)7 tx/L (n = 0, 1, 2, . . .), or you can choose a basis 

e 2ninx/L (n = 0, ±1, ±2, . . .), or an infinite number of other possibilities. 

In order to use any of these you need a relation such as Eq. (5.5) for each separate case. That’s a lot of integration. 
You need to do it for any interval that you may need and that's even more integration. Fortunately there’s a way out: 

Fundamental Theorem 

If you want to show that each of these respective choices provides an orthogonal set of functions you can integrate 
every special case as in Eq. (5.6), or you can do all the cases at once by deriving an important theorem. This theorem 
starts from the fact that all of these sines and cosines and complex exponentials satisfy the same differential equation, 
u" = Xu, where A is some constant, different in each case. If you studied section 4.5, you saw how to derive properties 
of trigonometric functions simply by examining the differential equation that they satisfy. If you didn't, now might be a 
good time to look at it, because this is more of the same. (I'll wait.) 

You have two functions U\ and U 2 that satisfy 

v![ = Aitti and u '2 = X 2 U 2 

Make no assumption about whether the A's are positive or negative or even real. The u’s can also be complex. Multiply 
the first equation by u\ and the second by u\, then take the complex conjugate of the second product. 

u* 2 u'[ = and u ]U 2 = X^u^^ 


Subtract the equations. 


*// *// / \ \ * \ * 
Vu^Hy 'U\'U‘2 — 
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Integrate from a to b 


dx (u* 2 u'[ — u x u 2 ") = (Ai — Ag) f dxu* 2 u 1 


(5.14) 


Now do two partial integrations. Work on the second term on the left: 

[ dxu x u * 2 = u x u* 2 — [ dx u\ u* 2 = u x u 2 
Ja a Ja a 

Put this back into the Eq. (5.14) and the integral terms cancel, leaving 


- u[u* 2 


+ / dxu'[u* 2 



(5.15) 


This is the central identity from which all the orthogonality relations in Fourier series derive. It's even more 
important than that because it tells you what types of boundary conditions you can use in order to get the desired 
orthogonality relations. (It tells you even more than that, as it tells you how to compute the adjoint of the second 
derivative operator. But not now — save that for later.) The expression on the left side of the equation has a name: 
"bilinear concomitant.” 

You can see how this is related to the work with the functions sin(n7ra:/L). They satisfy the differential equation 
u" = Xu with A = — n 2 7t 2 /L 2 . The interval in that case was 0 < x < L for a < x < b. 

There are generalizations of this theorem that you will see in places such as problems 6.16 and 6.17 and 10.21. In 

those extensions these same ideas will allow you to handle Legendre polynomials and Bessel functions and Ultraspherical 
polynomials and many other functions in just the same way that you handle sines and cosines. That development comes 
under the general name Sturm-Liouville theory. 

The key to using this identity will be to figure out what sort of boundary conditions will cause the left-hand side 

to be zero. For example if u(a) = 0 and u(b) = 0 then the left side vanishes. These are not the only possible boundary 

conditions that make this work; there are several other common cases soon to appear. 

The first consequence of Eq. (5.15) comes by taking a special case, the one in which the two functions U\ and U2 
are in fact the same function. If the boundary conditions make the left side zero then 


0 = (A x — A*) / dx «*(£)«! (a;) 


The A’s are necessarily the same because the u's are. The only way the product of two numbers can be zero is if one of 
them is zero. The integrand, u* 1 (x)u 1 (x) is always non-negative and is continuous, so the integral can't be zero unless 
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the function U\ is identically zero. As that would be a trivial case, assume it’s not so. This then implies that the other 
factor, (A x — A*) must be zero, and this says that the constant Ai is real. Yes, — ?T. 2 7r 2 /L 2 is real. 

[To use another language that will become more familiar later, A is an eigenvalue and d 2 / dx 2 with these boundary 
conditions is an operator. This calculation guarantees that the eigenvalue is real.] 

Now go back to the more general case of two different functions, and drop the complex conjugation on the A’s. 

f b 

0 = (Ai — A2) / dx u* 2 (x)ui(x) 

J a 

This says that if the boundary conditions on u make the left side zero, then for two solutions with different eigenvalues 
(A's) the orthogonality integral is zero. Eq. (5.5) is a special case of the following equation. 

f b 

If Ai 7^ A2, then = / dx u^^u^x) = 0 (5.16) 

J a 


Apply the Theorem 

As an example, carry out a full analysis of the case for which a = 0 and b = L, and for the boundary conditions u( 0) = 0 
and u{L) = 0. The parameter A is positive, zero, or negative. If A > 0, then set A = k 2 and 

u{x) = A sinli kx + B cosh kx, then u( 0) = B = 0 

and so u(L) = A sinh kL = 0 => A = 0 


No solutions there, so try A = 0 

u(x) = A + Bx, then u(0) = A = 0 and so u(L) = BL = 0 => B = 0 

No solutions here either. Try A < 0, setting A = —k 2 . 

u{x) = Asmkx + Bcoskx, then u(0) = 0 = B, so u(L) = AsinkL = 0 

Now there are many solutions because sinn7T = 0 allows k = nir / L with n any integer. But, sin(— x) = — sin(x) so 
negative integers just reproduce the same functions as do the positive integers; they are redundant and you can eliminate 
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them. The complete set of solutions to the equation u" = Xu with these boundary conditions has X n = —n 2 n 2 /L 2 and 
reproduces the result of the explicit integration as in Eq. (5.6). 


/n7tx\ 


u n {x) = sin y— j—J n= 1,2,3, 


(u n ,u m ) = / dx sin 
Jo 


/ rnrx \ 


and 


/ mirx\ 


sm 


= 0 


if n ^ m 


(5.17) 


There are other choices of boundary condition that will make the bilinear concomitant vanish. (Verify these!) For 
example 

u{ 0) = 0, u'{L) = 0 gives u n {x) = sin (n + 1 /- 2 )'kx/L n = 0, 1, 2, 3,... 

and without further integration you have the orthogonality integral for non-negative integers n and m 

rL i . I yii'-r 72/«^ \ . I vni, -r 72m 
dx sin I J sm I — 


(u n ,u m ) = J dx sin ^ 


= 0 


if 




(5.18) 


A very common choice of boundary conditions is 

u(a) = u{b), u'{a) = u'{b) (periodic boundary conditions) (5.19) 

It is often more convenient to use complex exponentials in this case (though of course not necessary). On 0 < x < L 

u(x) = e lkx , where k 2 = —A and u(0) = 1 = u{L) = e lkL 


The periodic behavior of the exponential implies that kL = 2mr. The condition that the derivatives match at the 
boundaries makes no further constraint, so the basis functions are 


u n (x) = e 2nmx / L , in = 0, ±1, ±2, . . .) 


(5.20) 


Notice that in this case the index n runs over all positive and negative numbers and zero, not just the positive integers. 
The functions e 27r * na: /^ and e~ 2nin X / L are independent, unlike the case of the sines discussed above. Without including 
both of them you don't have a basis and can’t do Fourier series. If the interval is symmetric about the origin as it often 
is, — L < x < +L, the conditions are 


ui-L) = e~ ikL = ui+L) = e 


-\-ikL 


^2 ikL ^ 


or 


(5.21) 
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This says that 2 kL = 2m r, so 

OO 

u n {x) = e nnlx / L , [n = 0, ±1, ±2, . . .) and f{x) = ^ c n u n {x ) 

— OO 


The orthogonality properties determine the coefficients: 

OO 

(Umi /) = (u m , C n U n ) = CrniUrn^Urn) 

— OO 

[ dx e~ mnlx/L f{x) = Cm{um , U m ) 

J — L 

= Cm f L dxe-“/ L e+“/ L = Cm J 


dx 1 = 2 Lc r 


In this case, sometimes the real form of this basis is more convenient and you can use the combination of the two 
sets u n and v n , where 


u n {x) = cosinirx/L), (n = 0, 1, 2, .. .) 

Unix) = sin(?t7nt/L), (n = 1, 2, . . .) (5.22) 

(u n , u m ) = 0 in yf m), (v n , v m ) = 0 (n ± m), (u n , v m ) = 0 (all n, m) 

and the Fourier series is a sum such as fix) = a nU n + J2T ^nV n - 

There are an infinite number of other choices, a few of which are even useful, e.g. 

u'ia ) = 0 = u'ib) (5.23) 

Take the same function as in Eq. (5.7) and try a different basis. Choose the basis for which the boundary conditions 
are w( 0) = 0 and vl (L) = 0. This gives the orthogonality conditions of Eq. (5.18). The general structure is always the 
same. _ 

fi x ) = ^2 a nU n (x), and use (%,%} = 0 (n ^ m) 

Take the scalar product of this equation with u m to get 

{Utni /) = {Umi ^ ^ U n ) = Clm(Umi U m ) 


(5.24) 
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This is exactly as before in Eq. (5.13), but with a different basis. To evaluate it you still have to do the integrals. 


/ r\ f L J ■ ((jn+ 1 h) nx\ f L . 2 ( (m + ifynx 

{ u m , ./) = J ax sm I j- I 1 = a m J ax snf ' 


L 

o B ((m + 1 / 2 ) I r)]=la m 


— amia-niiUm ) 


am. — 


(2m + l)7r 


Then the series is 


4 

71 


7IX 1 . 37TX 1 57TX 

sm — H — sm — - — | — sm — - — F 
2L 3 2L 5 2L 


(5.25) 


5.4 Musical Notes 

Different musical instruments sound different even when playing the same note. You won’t confuse the sound of a piano 
with the sound of a guitar, and the reason is tied to Fourier series. The note middle C has a frequency that is 261.6 Hz on 
the standard equal tempered scale. The angular frequency is then 2n times this, or 1643.8 radians/sec. Call it cuo = 1644. 
When you play this note on any musical instrument, the result is always a combination of many frequencies, this one and 
many multiples of it. A pure frequency has just Uq, but a real musical sound has many harmonics: uq, 2ujq, 3cuo, etc. 


Instead of e luJot an instrument produces ^ a n e muJot (5.26) 

71=1 


A pure frequency is the sort of sound that you hear from an electronic audio oscillator, and it’s not very interesting. Any 
real musical instrument will have at least a few and usually many frequencies combined to make what you hear. 

Why write this as a complex exponential? A sound wave is a real function of position and time, the pressure wave, 
but it’s easier to manipulate complex exponentials than sines and cosines, so when I write this, I really mean to take the 
real part for the physical variable, the pressure variation. The imaginary part is carried along to be discarded later. Once 
you're used to this convention you don’t bother writing the "real part understood" anywhere — it's understood. 

? ? 

p(t) = 3?^ a n e mUot = ^ \a n \ cos (nuot + 4> n ) where a n = \a n \e l( ^ n (5.27) 

n = 1 n= 1 
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I wrote this using the periodic boundary conditions of Eq. (5.19). The period is the period of the lowest frequency, 

T = 27t/o;o- 

A flute produces a combination of frequencies that is mostly concentrated in a small number of harmonics, while 
a violin or reed instrument produces a far more complex combination of frequencies. The size of the coefficients a n in 
Eq. (5.26) determines the quality of the note that you hear, though oddly enough its phase, </> n , doesn’t have an effect 
on your perception of the sound. 




These represent a couple of cycles of the sound of a clarinet. The left graph is about what the wave output of the 
instrument looks like, and the right graph is what the graph would look like if I add a random phase, </> n , to each of the 
Fourier components of the sound as in Eq. (5.27). They may look very different, but to the human ear they sound alike. 

You can hear examples of the sound of Fourier series online via the web site: 
courses.ee.sun.ac.za/Stelsels_en_Seine_315/wordpress/wp-content/uploads/jhu-signals/ 
and Listen to Fourier Series 

You can hear the (lack of) effect of phase on sound. You can also synthesize your own series and hear what they sound 
like under such links as “Fourier synthese” and “Harmonics applet" found on this page. You can back up from this link 
to larger topics by using the links shown in the left column of the web page. 

Real musical sound is of course more than just these Fourier series. At the least, the Fourier coefficients, a n , 
are themselves functions of time. The time scale on which they vary is however much longer than the basic period of 
oscillation of the wave. That means that it makes sense to treat them as (almost) constant when you are trying to 
describe the harmonic structure of the sound. Even the lowest pitch notes you can hear are at least 20 Hz, and few 
home sound systems can produce frequencies nearly that low. Musical notes change on time scales much greater than 
1/20 or 1/100 of a second, and this allows you to treat the notes by Fourier series even though the Fourier coefficients 
are themselves time-dependent. The attack and the decay of the note greatly affects our perception of it, and that is 
described by the time-varying nature of these coefficients.* 

* For an enlightening web page, including a complete and impressively thorough text on mathematics and music, look 
up the book by David Benson. It is available both in print from Cambridge Press and online, www.abdn.ac.uk/~mthl92/ 
(University of Aberdeen) 
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Parseval’s Identity 

Let u n be the set of orthogonal functions that follow from your choice of boundary conditions. 


f(x ) = 22 a n u n (x ) 

n 

Evaluate the integral of the absolute square of / over the domain. 


[ dx\f(x)\ 2 = 

r h 

/ dx 

^ ^ Q'm'U'm (%) 

* 

22 a nUn(x ) 

J a 

J a 

m 


n 


nb rb 

J>^a n / dxUm{x)*U n (x) = 22\ a n\ 2 dx \u n (x)\' 

im */ CL tn J CL 


m n 


In the more compact notation this is 


(/) /) — ( 'y ) U m U m ^ y ' Q j n'U j 'i2j — ^ ) U m (l n (u m j tin) — /* ) |On| (tin? tin) 


(5.28) 


The first equation is nothing more than substituting the series for /. The second moved the integral under the summation. 
The third equation uses the fact that all these integrals are zero except for the ones with m = n. That reduces the 
double sum to a single sum. If you have chosen to normalize all of the functions u n so that the integrals of |'U n (a;)| 2 are 
one, then this relation takes on a simpler appearance. This is sometimes convenient. 

What does this say if you apply it to a series I’ve just computed? Take Eq. (5.10) and see what it implies. 




</,/)=/ dxl = L = 22 \ a k\ 2 {u n , Un) 

J ° k = 0 


E 

fc=o 


tt(2 k + 1) J J o 


2 rL 


dx sin" 


. 2 f ( 2 k + 1)ttx 


E 

k = 0 


7T(2k + l)J 2 


Rearrange this to get 


OO ^ 

E TW7 


7 r 


k = 0 


(2k + l) 2 8 


(5.29) 
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A bonus. You have the sum of this infinite series, a result that would be quite perplexing if you see it without knowing 
where it comes from. While you have it in front of you, what do you get if you simply evaluate the infinite series of 
Eq. (5.10) at L/ 2. The answer is 1, but what is the other side? 


1 = 1 E 

TT 


1 . (2k + l)7t(L/2) _ 4 


7 r 2k + 1 

fc=o 


sin 


E 


7 r z — ' 2k 
k = o 


-(-ir 


or 


1111 

1 1 h - 

3 5 7 9 


7T 

4 


But does it Work? 

If you are in the properly skeptical frame of mind, you may have noticed a serious omission on my part. I've done all this 
work showing how to get orthogonal functions and to manipulate them to derive Fourier series for a general function, 
but when did I show that this actually works? Never. How do I know that a general function, even a well-behaved 
general function, can be written as such a series? I've proved that the set of functions sin(n7nc / L) are orthogonal on 
0 < x < L, but that’s not good enough. 

Maybe a clever mathematician will invent a new function that I haven’t thought of and that will be orthogonal to 
all of these sines and cosines that I'm trying to use for a basis, just as k is orthogonal to i and j. It won't happen. There 
are proper theorems that specify the conditions under which all of this Fourier manipulation works. Dirichlet worked out 
the key results, which are found in many advanced calculus texts. 

For example if the function is continuous with a continuous derivative on the interval 0 < x < L then the Fourier 
series will exist, will converge, and will converge to the specified function (except maybe at the endpoints). If you allow 
it to have a finite number of finite discontinuities but with a continuous derivative in between, then the Fourier series will 
converge and will (except maybe at the discontinuities) converge to the specified function. At these discontinuities it will 
converge to the average value taken from the left and from the right. There are a variety of other sufficient conditions 
that you can use to insure that all of this stuff works, but I’ll leave that to the advanced calculus books. 

5.5 Periodically Forced ODE’s 

If you have a harmonic oscillator with an added external force, such as Eq. (4.12), there are systematic ways to solve it, 
such as those found in section 4.2. One part of the problem is to find a solution to the inhomogeneous equation, and 
if the external force is simple enough you can do this easily. Suppose though that the external force is complicated but 
periodic, as for example when you're pushing a child on a swing. 
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That the force is periodic means F ext {t) = F ext (t + T) for all times t. The period is T . 

Pure Frequency Forcing 

Before attacking the general problem, look at a simple special case. Take the external forcing function to be Focosu) e t 
where this frequency is c o e = 2n/T. This equation is now 

m^ + kx + b^ = F 0 cos w e t = y + e ~ iUlet ] (5.30) 

Find a solution corresponding to each term separately and add the results. To get an exponential out, put an exponential 
in. 

for + kx + b = e lu}et assume x- inh (t) = Ae l0Jet 

at 2 at 

Substitute the assumed form and it will determine A. 


[m(-Wg) + b(iu e ) + k]Ae luJet = e luJet 


This tells you the value of A is 


A 


1 

— mwf + b itv e + k 


(5.31) 


The other term in Eq. (5.30) simply changes the sign in front off everywhere. The total solution for Eq. (5.30) is then 


•t'inh (^) 


Fo 

2 


„ iuj e t 


(5.32) 


This is the sum of a number and its complex conjugate, so it's real. You can rearrange it so that it looks a lot simpler, 
but there's no need to do that right now. Instead I'll look at what it implies for certain values of the parameters. 

Suppose that the viscous friction is small ( b is small). If the forcing frequency, w e is such 
that -mu e 2 + k = 0, or is even close to zero, the denominators of the two terms become very 
small. This in turn implies that the response of x to the oscillating force is huge. Resonance. 

See problem 5.27. In a contrasting case, look at w e very large. Now the response of the mass is 
very small; it barely moves. 

General Periodic Force 

Now I'll go back to the more general case of a periodic forcing function, but not one that is simply a cosine. If a function 



5 — Fourier Series 


148 


is periodic I can use Fourier series to represent it on the whole axis. The basis to use will of course be the one with 
periodic boundary conditions (what else?). Use complex exponentials, then 

u(t) = e iuJt where e M t+T) = e iujt 

This is just like Eq. (5.20) but with t instead of x, so 

Unit) = e 2nint / T , [n = 0, ±1, . . .) (5.33) 


Let 0 J e = 27 r/T, and this is 

u n it) = e inuJet 

The external force can now be represented by the Fourier series 


Fextit) = ^2 a k e lkuJet , where 


k=—c 


oo 

J2 a k e ikuJ ^ =a n T = (e in ^, F ext (?)> = dt e ~ in ^ F ext (t) 


k=—oo 


(Don't forget the implied complex conjugation in the definition of the scalar product, ( , ), Eq. (5.11)) Because the force 
is periodic, any other time interval of duration T is just as good, perhaps — T/2 to +T / 2 if that's more convenient. 
How does this solve the differential equation? Plug in. 


d 2 x . dx , 
m-r-r + b— + kx = > 
dt 2 dt ^ 


Un 6 


inoj e t 


(5.34) 


All there is to do now is to solve for an inhomogeneous solution one term at a time and then to add the results. Take 
one term alone on the right: 

d 2 x , dx , ,w. f 

m—r-r +b— + kx = e et 
dt 1 dt 

This is what I just finished solving a few lines ago, Eq. (5.31), but with nu) e instead of simply u e . The inhomogeneous 
solution is the sum of the solutions from each term. 


•t'inh(^) ^ ^ 




— m(ntUe) 2 + bincu e + k 


(5.35) 
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Suppose for example that the forcing function is a simple square wave. 

^ext(f) = { j ^xt (f + T) = F ext (f) 

The Fourier series for this function is one that you can do in problem 5.12. The result is 


Fext(f) = F 0 - V - 
7T 1 X — / n 


—e 
7 n ^ n 

n odd 


niujet 


(5.36) 


(5.37) 


The solution corresponding to Eq. (5.35) is now 


•t'inh ( t ) — F 0 


2m t—* ( - m(nco e ) 2 + ibnu e + k ) n 


niujet 


(5.38) 


A real force ought to give a real result; does this? Yes. For every positive n in the sum, there is a corresponding 
negative one and the sum of those two is real. You can see this because every n that appears is either squared or is 
multiplied by an “i." When you add the n = +5 term to the n = — 5 term it's adding a number to its own complex 
conjugate, and that's real. 

What peculiar features does this result imply? With the simply cosine force the phenomenon of resonance oc- 
curred, in which the response to a small force at a frequency that matched the intrinsic frequency \Jk/m produced a 
disproportionately large response. What other things happen here? 

The natural frequency of the system is (for small damping) still \Jkfm. Look to see where a denominator in 
Eq. (5.38) can become very small. This time it is when — m(nuj e ) 2 + k = 0. This is not only when the external frequency 
c o e matches the natural frequency; it’s when nu) e matches it. If the natural frequency is \Jk/m = 100 radians/sec you 
get a big response if the forcing frequency is 100 radians/sec or 33 radians/sec or 20 radians/sec or 14 radians/sec etc. 
What does this mean? The square wave in Eq. (5.36) contains many frequencies. It contains more than just the main 
frequency 2tt/T, it contains 3 times this and 5 times it and many higher frequencies. When any one of these harmonics 
matches the natural frequency you will have the large resonant response. 

Not only do you get a large response, look at the way the mass oscillates. If the force has a square wave frequency 
20 radians/sec, the mass responds* with a large sinusoidal oscillation at a frequency 5 times higher — 100 radians/sec. 


* The next time you have access to a piano, gently depress a key without making a sound, then strike the key one 
octave lower. Release the lower key and listen to the sound of the upper note. Then try it with an interval of an octave 
plus a fifth. 
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5.6 Return to Parseval 

When you have a periodic wave such as a musical note, you can Fourier analyze it. The boundary conditions to use are 
naturally the periodic ones, Eq. (5.20) or (5.33), so that 

OO 

/(f) = ^a n e muJot 

— OO 


If this represents the sound of a flute, the amplitudes of the higher frequency components (the a n ) drop off rapidly with 
n. If you are hearing an oboe or a violin the strength of the higher components is greater. 

If this function represents the sound wave as received by your ear, the power that you receive is proportional to 
the square of /. If / represent specifically the pressure disturbance in the air, the intensity (power per area) carried by 
the wave is f(t) 2 v/B where v is the speed of the wave and B is the bulk modulus of the air. The key property of this is 
that it is proportional to the square of the wave's amplitude. That’s the same relation that occurs for light or any other 
wave. Up to a known factor then, the power received by the ear is proportional to /(f) 2 . 

This time average of the power is (up to that constant factor that I’m ignoring) 


Now put the Fourier series representation of the sound into the integral to get 


1 f +T 

lim — — / dt 
oo 2 T J_ T 


E 


( 3 - 77,6 


inojot 


The sound /(f) is real, so by problem 5.11, a_ n = a* n . Also, using the result of problem 5.18 the time average of e lU}t 
is zero unless uj = 0; then it's one. 


= 2 tJ-t dt 


-T 

+T 


^2 a n e l 


inidot 


^2 


imuiot 


= lim — — 
oo 2 T 


—T 




inuj 0 t imuj 0 t 


r+T 


VVa n a m lim 

“ ™ 2 T J„ T 
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— ^ ^ n 
n 

= X]ki 2 ( 5 - 39 ) 

n 

Put this into words and it says that the time-average power received is the sum of many terms, each one of which can be 
interpreted as the amount of power coming in at that frequency nu)o. The Fourier coefficients squared (absolute-squared 
really) are then proportional to the part of the power at a particular frequency. The "power spectrum." 

Other Applications 

In section 10.2 Fourier series will be used to solve partial differential equations, leading to equations such as Eq. (10.15). 

In quantum mechanics, Fourier series and its generalizations will manifest themselves in displaying the discrete 
energy levels of bound atomic and nuclear systems. 

Music synthesizers are all about Fourier series and its generalizations. 

5.7 Gibbs Phenomenon 

There's a picture of the Gibbs phenomenon with Eq. (5.10). When a function has a discontinuity, its Fourier series 
representation will not handle it in a uniform way, and the series overshoots its goal at the discontinuity. The detailed 
calculation of this result is quite pretty, and it’s an excuse to pull together several of the methods from the chapters on 
series and on complex algebra. 


4 
7 r 


OO 


E 

k = 0 


1 

2k + 1 


sin 


(2k + 1)7 tx 

I 


= 1 , 


(0 < x < L) 



highest harmonic: 5 highest harmonic: 19 highest harmonic: 99 


The analysis sounds straight-forward. Find the position of the first maximum. Evaluate the series there. It really 
is almost that clear. First however, you have to start with the a finite sum and find the first maximum of that. Stop the 
sum at k = N. 
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k = 0 
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2k + 1 


. (2k + \)ttx 
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/tv 0*0 


(5.40) 
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For a maximum, set the derivative to zero. 


N 


/tv(^) = yE 


cos 


(2k + 1)ttx 


Write this as the real part of a complex exponential and use Eq. (2.3). 


N 


N 


N 


£ 


2k 


e i(2k+l)nx/L _ z 2k+1 = Z 2_ = ‘ 

0 0 0 

Factor these complex exponentials in order to put this into a nicer form. 


1 - 2 


2N+2 


1-Z 2 


= e 


t/L e~ i ^ x ( N + 1 )/ L - e^x(N+l)/L e inx(N+l)/L _ s j n (N + 1 )ttx / L i7VX{N+1)/L 
e -inx/L _ e mx/L e in x/L sin 7 TX / L 


The real part of this changes the last exponential into a cosine. Now you have the product of the sine and cosine of 
(N + 1)ttx/L, and that lets you use the trigonometric double angle formula. 


/aKz) = 


4 sin2(iV + l)irx/L 
L 2 sin 7 tx/L 


(5.41) 


This is zero at the maximum. The first maximum after x = 0 is at 2 (N + l)irx/L = 7t, or x = L/2(N + 1). 

Now for the value of at this point, 

. , 7/ /ir 4^ 1 (2k + 1 )tcL/2(N + 1) 4 A 1 (2k + 1)tt 

f N (L 2(N + 1) = -y-; sin J y = - > — sin , Ar \ 

” 7T^2k + l L TT^2k + l 2(N + 1) 

k = 0 k = 0 

The final step is to take the limit as N — > oo. As k varies over the set 0 to N , the argument of the sine varies from 
a little more than zero to a little less than 7t. As N grows you have the sum over a lot of terms, each of which is 
approaching zero. It’s an integral. Let tf, = k/N then A tj. = l/N. This sum is approximately 


N N 

1 2 1 . 2 f 

/ y r, AT, smtfcTT = — / A tk — sin > — / 
^ 2A 't k tt l-j t k n Jq 


- i 1 dt . 
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In this limit 2k + 1 and 2k are the same, and N + 1 is the same as N. 

Finally, put this into a standard form by changing variables to irt = x. 

- P dx b — = -Si(vr) = 1.17898 [ X dt^ = Si(x) (5.42) 

7T J 0 X 7T Jo t 

The function Si is called the “sine integral." It's just another tabulated function, along with erf, T, and others. This 
equation says that as you take the limit of the series, the first part of the graph approaches a vertical line starting from 
the origin, but it overshoots its target by 18%. 


Exercises 

1 A vector is given to be A = 5x + 3 y. Let a new basis be e\ = (x + y)/V 2 , and £2 = (x — y)/V 2 . Use scalar 
products to find the components of A in the new basis: A = A\&\ + A 2 § 2 - 

2 For the same vector as the preceding problem, and another basis /1 = 3T + 4 y and /2 = — 8x + 6 y, express A in 
the new basis. Are these basis vectors orthogonal? 

3 On the interval 0 < x < L, sketch three graphs: the first term alone, then the second term alone, then the third. Try 
to get the scale of the graphs reasonable accurate. Now add the first two and graph. Then add the third also and graph. 
Do all this by hand, no graphing calculators, though if you want to use a calculator to calculate a few points, that's ok. 

sin [t\x/V) — | sin (37n xjV) + A s j n ( 5ttx/L ) 


4 For what values of a are the vectors A = ax — 2 y + z and B = 2 ax + ay — 4 z orthogonal? 

5 On the interval 0 < x < L with a scalar product defined as (f,g) = f^dxf{x)*g{x), show that these are zero, 
making the functions orthogonal: 

x and L — sin7 rx/L and cos7 rx/L, sin37T x/L and L — 2x 
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6 Same as the preceding, show that these functions are orthogonal: 

e 2i%x /L and e ~ 2i ™/ L , L-\(7 + i)x and L + | ix 

7 With the same scalar product the last two exercises, for what values of cr are the functions fi(x) = ax—(l—a)(L—^x) 

and = 2 ax + (1 + a)(L — x ) orthogonal? What is the interpretation of the two roots? 

8 Repeat the preceding exercise but use the scalar product (f,g) = f[ L dx f(x)*g(x). 

9 Use the scalar product ( f,g ) = f^dx f(x)*g(x), and show that the Legendre polynomials Pq, P\, P 2 , P 3 of 
Eq. (4.61) are mutually orthogonal. 

10 Change the scalar product in the preceding exercise to (f,g) = fy dx f(x)*g(x) and determine if the same polyno- 
mials are still orthogonal. 

11 Verify the examples stated on page 136. 
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Problems 

5.1 Get the results in Eq. (5.18) by explicitly calculating the integrals. 

5.2 (a) The functions with periodic boundary conditions, Eq. (5.20), are supposed to be orthogonal on 0 < x < L. 
That is, (tin, Mm) = 0 for n ^ m. Verify this by explicit integration. What is the result if n = m or n = —ml 
The notation is defined in Eq. (5.11). (b) Same calculation for the real version, (u n ,u m ), ( v n ,Vm ), and (u n ,v m ), 
Eq. (5.22) 

5.3 Find the Fourier series for the function f(x) = 1 as in Eq. (5.10), but use as a basis the set of functions u n 
on 0 < x < L that satisfy the differential equation u" = Xu with boundary conditions u'(0) = 0 and u'(L) = 0. 
(Eq. (5.23)) Necessarily the first step will be to examine all the solutions to the differential equation and to find the cases 
for which the bilinear concomitant vanishes. 

(b) Graph the resulting Fourier series on —2 L < x < 2 L. 

(c) Graph the Fourier series Eq. (5.10) on —2 L < x < 2 L. 

5.4 (a) Compute the Fourier series for the function x 2 on the interval 0 < x < L, using as a basis the functions with 
boundary conditions u'{ 0) = 0 and u'(L ) = 0. 

(b) Sketch the partial sums of the series for 1, 2, 3 terms. Also sketch this sum outside the original domain and see what 
this series produces for an extension of the original function. Ans: Eq. (5.1) 

5.5 (a) Compute the Fourier series for the function x on the interval 0 < x < L, using as a basis the functions with 
boundary conditions u{ 0) = 0 = u(L). How does the coefficient of the n th term decrease as a function of nl (b) Also 
sketch this sum within and outside the original domain to see what this series produces for an extension of the original 
function. 

Ans: ( ~ 1 ) j " +1 sin(mtx/L) 

5.6 (a) In the preceding problem the sine functions that you used don’t match the qualitative behavior of the function 
x on this interval because the sine is zero at x = L and x isn't. The qualitative behavior is different from the 
basis you are using for the expansion. You should be able to get better convergence for the series if you choose 
functions that more closely match the function that you’re expanding, so try repeating the calculation using basis 
functions that satisfy u( 0) = 0 and u'(L) = 0. How does the coefficient of the n th term decrease as a function of nl 
(b) As in the preceding problem, sketch some partial sums of the series and its extension outside the original domain. 

Ans: § Eo° ((“ 1 )”/( 2n + !) 2 ) sin (( n + V 2 ) W L) 
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5.7 The function sin 2 a: is periodic with period 7t. What is its Fourier series representation using as a basis functions 
that have this period? Eqs. (5.20) or (5.22). 

5.8 In the two problems 5.5 and 5.6 you improved the convergence by choosing boundary conditions that better matched 

the function that you want. Can you do better? The function x vanishes at the origin, but its derivative isn't zero at L, 
so try boundary conditions u( 0) = 0 and u(L ) = Lu'(L). These conditions match those of x so this ought to give even 
better convergence, but first you have to verify that these conditions guarantee the orthogonality of the basis functions. 
You have to verify that the left side of Eq. (5.15) is in fact zero. When you set up the basis, you will examine functions 
of the form sin kx, but you will not be able to solve explicitly for the values of k. Don't worry about it. When you use 

Eq. (5.24) to get the coefficients all that you need to do is to use the equation that k satisfies to do the integrals. You 

do not need to have solved it. If you do all the algebra correctly you will probably have a surprise. 

5.9 (a) Use the periodic boundary conditions on — L < x < +L and basis e ninx / L to write x 2 as a Fourier series. 
Sketch the sums up to a few terms, (b) Evaluate your result at x = L where you know the answer to be L 2 and deduce 
from this the value of £(2). 

5.10 On the interval — 7t < x < n, the function f(x) = cosx. Expand this in a Fourier series defined by u" = Xu and 

u(— 7 t ) = 0 = u(tt). If you use your result for the series outside of this interval you define an extension of the original 

function. Graph this extension and compare it to what you normally think of as the graph of cosx. As always, go back 
to the differential equation to get all the basis functions. 

Ans: EfcLo (2fc-KS)(2fc— 1) Shl (( 2k + 1 )( X + 7F )/ 2 ) 

5.11 Represent a function / on the interval — L < x < L by a Fourier series using periodic boundary conditions 

OO 

f(x) = Y, a ne nnix/L 

— OO 

(a) If the function / is odd, prove that for all n, a_ n = — a n 

(b) If the function / is even, prove that all a_ n = a n . 

(c) If the function / is real, prove that all a_ n = a* n . 

(d) If the function is both real and even, characterize a n . 

(e) If the function is imaginary and odd, characterize a n . 


5.12 Derive the series Eq. (5.37). 
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5.13 For the function e _af on 0 < t < T, express it as a Fourier series using periodic boundary conditions [w(0) = u(T) 
and m'(0) = u'{T)\. Examine for plausibility the cases of large and small a. The basis functions for periodic boundary 
conditions can be expressed either as cosines and sines or as complex exponentials. Unless you can analyze the problem 
ahead of time and determine that it has some special symmetry that matches that of the trig functions, you’re usually 
better off with the exponentials. Ans: [(l — e ~ aT ) /ctT] [l + 2 J2Tl a2 cosnuj t + emeu sin nut] / [a 2 + n 2 cu 2 ]] 

5.14 (a) On the interval 0 < x < L, write x(L — x) as a Fourier series, using boundary conditions that the expansion 
functions vanish at the endpoints. Next, evaluate the series at x = L/2 to see if it gives an interesting result, (b) Finally, 
what does Parseval's identity tell you? 

Ans: Ef - (-1)")] sin(mr x/L) 

5.15 A full-wave rectifier takes as an input a sine wave, sin cut and creates the output f(t) = | sin cut |. The period 
of the original wave is 27 r/cu, so write the Fourier series for the output in terms of functions periodic with this period. 
Graph the function / first and use the graph to anticipate which terms in the Fourier series will be present. 

When you're done, use the result to evaluate the infinite series l) k+1 /(4k 2 — 1) 

Ans: 7t/4 — 1/2 

5.16 A half-wave rectifier takes as an input a sine wave, sin cut and creates the output 

sin cut if sin tut >0 and 0 if sin cut < 0 

The period of the original wave is 27t/cu, so write the Fourier series for the output in terms of functions periodic with 
this period. Graph the function first. Check that the result gives the correct value at t = 0, manipulating it into a 
telescoping series. Sketch a few terms of the whole series to see if it's heading in the right direction. 

Ans: VtT + V 2 Sin Ujt - 8/tT Eneve n >0 COS (nut)/{n 2 - 1) 

5.17 For the undamped harmonic oscillator, apply an oscillating force (cosine). This is a simpler version of Eq. (5.30). 
Solve this problem and add the general solution to the homogeneous equation. Solve this subject to the initial conditions 
that x(0) = 0 and ttr(0) = Vq. 

5.18 The average (arithmetic mean) value of a function is 

j f j i ^ 1 

(./')= lim i / dtf(t) or (/) = lim ^ f dtf(t ) 

1 — >oo Z1 J —T J- “ -L JO 
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as appropriate for the problem. 

What is (sinoT)? What is (sin 2 tut)? What is (e -(rf2 )? 

What is (sincuit sinu^t)? What is (e i£Jt )? 

5.19 In the calculation leading to Eq. (5.39) I assumed that f(t) is real and then used the properties of a n that followed 
from that fact. Instead, make no assumption about the reality of f(t) and compute 

<i/(«)i 2 > = </(*)*/<*)> 

Show that it leads to the same result as before, Yh |a n | 2 . 

5.20 The series 

oo 

a n cosnd (|a| < 1) 

n = 0 

represents a function. Sum this series and determine what the function is. While you’re about it, sum the similar series 
that has a sine instead of a cosine. Don't try to do these separately; combine them and do them as one problem. And 
check some limiting cases of course. And graph the functions. Ans: as\\\9 / (1 + a 2 — 2acos0) 

5.21 Apply Parseval's theorem to the result of problem 5.9 and see what you can deduce. 

5.22 If you take all the elements u n of a basis and multiply each of them by 2, what happens to the result for the 
Fourier series for a given function? 

5.23 In the section 5.3 several bases are mentioned. Sketch a few terms of each basis. 

5.24 A function is specified on the interval 0 < t < T to be 


m = 


1 (0 < t < t 0 ) 

0 ( t 0 <t<T ) 


0 < t 0 < T 


On this interval, choose boundary conditions such that the left side of the basic identity (5.15) is zero. Use the 
corresponding choice of basis functions to write / as a Fourier series on this interval. 


5.25 Show that the boundary conditions m( 0) = 0 and au(L) +/3u'(L) = 0 make the bilinear concomitant in Eq. (5.15) 
vanish. Are there any restrictions on a and /3? Do not automatically assume that these numbers are real. 
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5.26 Derive a Fourier series for the function 

. f Ax (0 < x < LI 2) 

~ { A(L-x) {L/2 < x < L) 

Choose the Fourier basis that you prefer. Evaluate the resulting series at x = L/ 2 to check the result. Sketch the sum 
of a couple of terms. Comment on the convergence properties of the result. Are they what you expect? What does 
Parseval's identity say? 

Ans: (2AL/tt 2 ) J2kodd(~ l)^ -1 )/ 2 sin(kirx / L) / k 2 

5.27 Rearrange the solution Eq. (5.32) into a more easily understood form, (a) Write the first denominator as 

- mu l + bioj e + k = Re 1 ^ 

What are R and 0? The second term does not require you to repeat this calculation, just use its results, now combine 
everything and write the answer as an amplitude times a phase-shifted cosine. 

(b) Assume that b is not too big and plot both R and <f> versus the forcing frequency ui e . Also, and perhaps more 
illuminating, plot 1 / R. 

5.28 Find the form of Parseval’s identity appropriate for power series. Assume a scalar product (f,g) = J^ 1 f(x)*g(x)dx 
for the series f(x) = J2T a n'X n , and g(x) = h nX n , expressing the result in terms of matrices. Next, test your result 
on a simple, low-order polynomial. 

Ans: (<2q aj . . .)M(bo b\ . . ,)~where Moo = 2, Md 2 = 2 /s ^04 = 2 /b, ■ ■ ■ and~is transpose. 

5.29 (a) In the Gibbs phenomenon, after the first maximum there is a first minimum. Where is it? how big is the 
function there? What is the limit of this point? That is, repeat the analysis of section 5.7 for this minimum point. 

(b) While you’re about it, what will you get for the limit of the sine integral, Si(oo)? The last result can also be derived 
by complex variable techniques of chapter 14, Eq. (14.16). Ans: (2/ 7t) Si(27t) = 0.9028 

5.30 Make a blown-up copy of the graph preceding Eq. (5.40) and measure the size of the overshoot. Compare this 
experimental value to the theoretical limit. Same for the first minimum. 

5.31 Find the power series representation about the origin for the sine integral, Si, that appeared in Eq. (5.42). What 
is its domain of convergence? 

Ans: | E“(-l) n (^ 2n+1 /(2 n + 1)(2 n + 1)!) 
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5.32 An input potential in a circuit is given to be a square wave ±Vo at frequency c u. What i , b 

is the voltage between the points a and b? In particular, assume that the resistance is small, 
and show that you can pick values of the capacitance and the inductance so that the output 
potential is almost exactly a sine wave at frequency 3c o. A filter circuit. Recall section 4.8. 


5.33 For the function sin(7nE/P) on (0 < x < 2 L), expand it in a Fourier series using as a basis the trigonometric 
functions with the boundary conditions u'( 0) = 0 = u'(2L), the cosines. Graph the resulting series as extended outside 
the original domain. 



5.34 For the function cos(nx/L) on (0 < x < 2 L), expand it in a Fourier series using as a basis the trigonometric 
functions with the boundary conditions m(0) = 0 = u(2L), the sines. Graph the resulting series as extended outside the 
original domain. 

5.35 (a) For the function f(x) = x 4 , evaluate the Fourier series on the interval — L < x < L using periodic boundary 
conditions ( u(—L ) = u(L ) and u'(—L ) = ti'(L)). (b) Evaluate the series at the point x = L to derive the zeta function 
value £(4) = 7t 4 /90. Evaluate it at x = 0 to get a related series. 

Ans: ±P 4 + P 4 £~(-l) n cos nnx/L 


5.36 Fourier series depends on the fact that the sines and cosines are orthogonal when integrated over a suitable interval. 
There are other functions that allow this too, and you've seen one such set. The Legendre polynomials that appeared in 
section 4.11 in the chapter on differential equations satisfied the equations (4.62). One of these is 



dx P n (x)P m {x) 


2 

2 n + 1 


5 


nm 


This is an orthogonality relation, (P n ,P m ) = 25 nm / (2n + 1), much like that for trigonometric functions. Write a 
function f(x) = a n Pn{%) ar| d deduce an expression for evaluating the coefficients a n . Apply this to the function 

f(x) = x 2 . 


5.37 For the standard differential equation u" = A u, use the boundary conditions u( 0) = 0 and 2 u(L) = Lu r (L). This 
is a special case of problem 5.25, so the bilinear concomitant vanishes. If you haven’t done that problem, at least do this 
special case. Find all the solutions that satisfy these conditions and graph a few of them. You will not be able to find 
an explicit solution for the As, but you can estimate a few of them to sketch graphs. Did you get them all ? 
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5.38 Examine the function on —L < x < L given by 

f 0 (— L < x < —Lf 2) and ( L/2 < x < L) 

f(x) = \ 1 (0 < x < L/2 ) 

(-1 {—L/2 < x < 0) 

Draw it first. Now find a Fourier series representation for it. You may choose to do this by doing lots of integrals, OR 
you may prefer to start with some previous results in this chapter, change periods, add or subtract, and do no integrals 
at all. 

5.39 In Eq. (5.30) I wrote cosui e t as the sum of two exponentials, e lulet + e - *^. Instead, write the cosine as e luJei with 
the understanding that at the end you take the real part of the result, Show that the result is the same. 

5.40 From Eq. (5.41) write an approximate closed form expression for the partial sum f]y{x) for the region but 

not necessarily x <C NL, though that extra-special case is worth doing too. 

5.41 Evaluate the integral f^dxx 2 using the series Eq. (5.1) and using the series (5.2). 

5.42 The Fourier series in problem 5.5 uses the same basis as the series Eq. (5.10). What is the result of evaluating the 
scalar products (l, l) and (l,x) with these series? 

5.43 If you evaluated the n = m case of Eq. (5.6) by using a different trig identity, you can do it by an alternative 
method: say that n and m in this equation aren't necessarily integers. Then take the limit as n — > m. 


Vector S 


paces 


The idea of vectors dates back to the middle 1800’s, but our current understanding of the concept waited until Peano’s 
work in 1888. Even then it took many years to understand the importance and generality of the ideas involved. This one 
underlying idea can be used to describe the forces and accelerations in Newtonian mechanics and the potential functions 
of electromagnetism and the states of systems in quantum mechanics and the least-square fitting of experimental data 
and much more. 

6.1 The Underlying Idea 

What is a vector? 

If your answer is along the lines “something with magnitude and direction” then you have something to unlearn. 
Maybe you heard this definition in a class that I taught. If so, I lied; sorry about that. At the very least I didn’t tell the 
whole truth. Does an automobile have magnitude and direction? Does that make it a vector? 

The idea of a vector is far more general than the picture of a line with an arrowhead attached to its end. That 
special case is an important one, but it doesn’t tell the whole story, and the whole story is one that unites many areas 
of mathematics. The short answer to the question of the first paragraph is 

A vector is an element of a vector space. 

Roughly speaking, a vector space is some set of things for which the operation of addition is defined and the 
operation of multiplication by a scalar is defined. You don’t necessarily have to be able to multiply two vectors by each 
other or even to be able to define the length of a vector, though those are very useful operations and will show up in 
most of the interesting cases. You can add two cubic polynomials together: 

(2 - 3x + 4a; 2 - 7x 3 ) + ( - 8 - 2x + 11a; 2 + 9a; 3 ) 

makes sense, resulting in a cubic polynomial. You can multiply such a polynomial by* 17 and it's still a cubic polynomial. 
The set of all cubic polynomials in x forms a vector space and the vectors are the individual cubic polynomials. 

The common example of directed line segments (arrows) in two or three dimensions fits this idea, because you can 
add such arrows by the parallelogram law and you can multiply them by numbers, changing their length (and reversing 
direction for negative numbers). 


* The physicist’s canonical random number 
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Another, equally important example consists of all ordinary real-valued functions of a real variable: two such 
functions can be added to form a third one, and you can multiply a function by a number to get another function. The 
example of cubic polynomials above is then a special case of this one. 

A complete definition of a vector space requires pinning down these ideas and making them less vague. In the end, 
the way to do that is to express the definition as a set of axioms. From these axioms the general properties of vectors 
will follow. 

A vector space is a set whose elements are called “vectors" and such that there are two operations defined on 
them: you can add vectors to each other and you can multiply them by scalars (numbers). These operations must obey 
certain simple rules, the axioms for a vector space. 

6.2 Axioms 

The precise definition of a vector space is given by listing a set of axioms. For this purpose, I'll denote vectors by arrows 
over a letter, and I’ll denote scalars by Greek letters. These scalars will, for our purpose, be either real or complex 
numbers — it makes no difference which for now.* 

1 There is a function, addition of vectors, denoted +, so that V\ + V 2 is another vector. 

2 There is a function, multiplication by scalars, denoted by juxtaposition, so that av is a vector. 

3 + V 2 ) + V 3 = V\ + (V 2 + V 3 ) (the associative law). 

4 There is a zero vector, so that for each v, v + O = v. 

5 There is an additive inverse for each vector, so that for each v, there is another vector v' so that v + v' = O. 

6 The commutative law of addition holds: V\ + V 2 = V 2 + V\. 

7 (a + /3)v = av + j3v. 

8 ( a/3)v = a(/3v). 

9 a{v\ + V 2 ) = avi + av 2 . 

10 lv = v. 

In axioms 1 and 2 I called these operations “functions." Is that the right use of the word? Yes. Without going 
into the precise definition of the word (see section 12.1), you know it means that you have one or more independent 
variables and you have a single output. Addition of vectors and multiplication by scalars certainly fit that idea. 


* For a nice introduction online see distance-ed.math.tamu.edu/Math640, chapter three. 
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6.3 Examples of Vector Spaces 

Examples of sets satisfying these axioms abound: 

1 The usual picture of directed line segments in a plane, using the parallelogram law of addition. 

2 The set of real-valued functions of a real variable, defined on the domain [a < x < b]. Addition is defined pointwise. 
If fi and /2 are functions, then the value of the function /i + /2 at the point x is the number f\(x) + /^(x). That 
is, /1 + /2 = /3 means fs(x) = fi{x) + f 2 (x). Similarly, multiplication by a scalar is defined as ( af)(x ) = a(f(x)). 
Notice a small confusion of notation in this expression. The first multiplication, ( af ), multiplies the scalar a by the 
vector /; the second multiplies the scalar a by the number f(x). 

3 Like example 2, but restricted to continuous functions. The one observation beyond the previous example is that 
the sum of two continuous functions is continuous. 

4 Like example 2, but restricted to bounded functions. The one observation beyond the previous example is that the 
sum of two bounded functions is bounded. 

5 The set of n-tuples of real numbers: (ai, CI 2 , . . . , a n ) where addition and scalar multiplication are defined by 

(ftl, . . . , Cl n) T (&i) ■ ■ ■ 1 bn) = ifll T bi, . . . , Cln T b n ) . . . , Cln) = (ctCli, . . . , Oidn) 

6 The set of square-integrable real-valued functions of a real variable on the domain [a < x < b\. That is, restrict 

example two to those functions with dx \f(x)\ 2 < 00 . Axiom 1 is the only one requiring more than a second to 

check. 

7 The set of solutions to the equation d 2 (j)/dx 2 + d 2 (j)/dy 2 = 0 in any fixed domain. (Laplace’s equation) 

8 Like example 5, but with n = 00 . 

9 Like example 8, but each vector has only a finite number of non-zero entries. 

10 Like example 8, but restricting the set so that Y1T \ a k\ 2 < °°- Again, only axiom one takes work. 

11 Like example 10, but the sum is )T)'j >c |a^| < 00 . 

12 Like example 10, but |a^| p < 00 . (p > 1) 

13 Like example 6, but J^dx\f(x)\ p < 00 . 

14 Any of examples 2-13, but make the scalars complex, and the functions complex valued. 

15 The set of all nx n matrices, with addition being defined element by element. 

16 The set of all polynomials with the obvious laws of addition and multiplication by scalars. 

17 Complex valued functions on the domain [a < x < b] with Yl x \f( x )\ 2 < °°- (Whatever this means. See 
problem 6.18) 
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18 {O}, the space consisting of the zero vector alone. 

19 The set of all solutions to the equations describing small motions of the surface of a drumhead. 

20 The set of solutions of Maxwell’s equations without charges or currents and with finite energy. That is, J[E 2 + 

B 2 ]d 3 x < oo. 

21 The set of all functions of a complex variable that are differentiable everywhere and satisfy 

J dx dy e ~ x2 ~ y2 \f (z)\ 2 < oo, 


where z = x + iy. 

To verify that any of these is a vector space you have to run through the ten axioms, checking each one. (Actually, 
in a couple of pages there’s a theorem that will greatly simplify this.) To see what is involved, take the first, most familiar 
example, arrows that all start at one point, the origin. I'll go through the details of each of the ten axioms to show that 
the process of checking is very simple. There are some cases for which this checking isn’t so simple, but the difficulty is 
usually confined to verifying axiom one. 

The picture shows the definitions of addition of vectors and multiplication by scalars, the first two axioms. The 
commutative law, axiom 6, is clear, as the diagonal of the parallelogram doesn’t depend on which side you're looking at. 



The associative law, axiom 3, is also illustrated in the picture. The zero vector, axiom 4, appears in this picture as just 
a point, the origin. 

The definition of multiplication by a scalar is that the length of the arrow is changed (or even reversed) by the factor 
given by the scalar. Axioms 7 and 8 are then simply the statement that the graphical interpretation of multiplication of 
numbers involves adding and multiplying their lengths. 
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Axioms 5 and 9 appear in this picture. 

Finally, axiom 10 is true because you leave the vector alone when you multiply it by one. 

This process looks almost too easy. Some of the axioms even look as though they are trivial and unnecessary. The 
last one for example: why do you have to assume that multiplication by one leaves the vector alone? For an answer, 

I will show an example of something that satisfies all of axioms one through nine but not the tenth. These processes, 
addition of vectors and multiplication by scalars, are functions. I could write “f(v i,t?2)” instead of “v\ +V2" and write 
“g(a,v)" instead of “av" . The standard notation is just that — a common way to write a vector-valued function of 
two variables. I can define any function that I want and then see if it satisfies the required properties. 

On the set of arrows just above, redefine multiplication by a scalar (the function g of the preceding paragraph) 
to be the zero vector for all scalars and vectors. That is, av = O for all a and v. Look back and you see that this 
definition satisfies all the assumptions 1-9 but not 10. For example, 9: a(v\ + V2) = orvi + av 2 because both sides 
of the equation are the zero vector. This observation proves that the tenth axiom is independent of the others. If you 
could derive the tenth axiom from the first nine, then this example couldn’t exist. This construction is of course not a 
vector space. 


Function Spaces 

Is example 2 a vector space? How can a function be a vector? This comes down to your understanding of the word 
“function.” Is f(x) a function or is f(x ) a number? Answer: it’s a number. This is a confusion caused by the 
conventional notation for functions. We routinely call f{x) a function, but it is really the result of feeding the particular 
value, x , to the function / in order to get the number f(x). This confusion in notation is so ingrained that it's hard to 
change, though in more sophisticated mathematics books it is changed. 

In a better notation, the symbol / is the function, expressing the relation between 
all the possible inputs and their corresponding outputs. Then /( 1), or /( 7t), or f{x) are 
the results of feeding / the particular inputs, and the results are (at least for example 2) 
real numbers. Think of the function / as the whole graph relating input to output; the pair 
(x, f{x )) is then just one point on the graph. Adding two functions is adding their graphs. 

For a precise, set theoretic definition of the word function, see section 12.1. Reread the 
statement of example 2 in light of these comments. 



h + h — fs 
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Special Function Space 

Go through another of the examples of vector spaces written above. Number 6, the square-integrable real-valued functions 
on the interval a < x < b. The single difficulty here is the first axiom: is the sum of two square-integrable functions 
itself square-integrable? The other nine axioms are yours to check. 

Suppose that 

rb rb 

/ f(x) 2 dx< oo and / g(x) 2 dx<oo. 

J a J a 

simply note the combination 


{fix) + g(x)) 2 + {f(x) - g Or)) 2 = 2 f(x) 2 + 2 g(x) 2 

The integral of the right-hand side is by assumption finite, so the same must hold for the left side. This says that the 
sum (and difference) of two square-integrable functions is square-integrable. For this example then, it isn't very difficult 
to show that it satisfies the axioms for a vector space, but it requires more than just a glance. 

There are a few properties of vector spaces that seem to be missing. There is the somewhat odd notation v' for 
the additive inverse in axiom 5. Isn't that just — vl Isn't the zero vector simply the number zero times a vector? Yes in 
both cases, but these are theorems that follow easily from the ten axioms listed. See problem 6.20. I'll do part (a) of 
that exercise as an example here: 

Theorem: the vector O is unique. 

Proof: assume it is not, then there are two such vectors, 0\ and 0 2 - 
By [4], 0i + 0 2 = Oi (0 2 is a zero vector) 

By [6], the left side is 0 2 + 0\ 

By [4], this is O 2 {0\ is a zero vector) 

Put these together and 0 1 = 02- 

Theorem: If a subset of a vector space is closed under addition and multiplication by scalars, then it is itself 
a vector space. This means that if you add two elements of this subset to each other they remain in the subset and 
multiplying any element of the subset by a scalar leaves it in the subset. It is a "subspace.” 

Proof: the assumption of the theorem is that axioms 1 and 2 are satisfied as regards the subset. That axioms 3 through 
10 hold follows because the elements of the subset inherit their properties from the larger vector space of which they 
are a part. Is this all there is to it? Not quite. Axioms 4 and 5 take a little more thought, and need the results of the 
problem 6.20, parts (b) and (d). 
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6.4 Linear Independence 

A set of non-zero vectors is linearly dependent if one element of the set can be written as a linear combination of the 
others. The set is linearly independent if this cannot be done. 

Bases, Dimension, Components 

A basis for a vector space is a linearly independent set of vectors such that any vector in the space can be written as a 
linear combination of elements of this set. The dimension of the space is the number of elements in this basis. 

If you take the usual vector space of arrows that start from the origin and lie in a plane, the common basis is 
denoted i, j. If I propose a basis consisting of 


A 


_ Is- _i_ \/3 ? 
2 t_r 2 Ji 



these will certainly span the space. Every vector can be written as a linear combination of them. They are however, 
redundant; the sum of all three is zero, so they aren't linearly independent and aren’t a basis. If you use them as if they 
are a basis, the components of a given vector won’t be unique. Maybe that's o.k. and you want to do it, but either be 
careful or look up the mathematical subject called "frames.” 

Beginning with the most elementary problems in physics and mathematics, it is clear that the choice of an 
appropriate coordinate system can provide great computational advantages. In dealing with the usual two and three 
dimensional vectors it is useful to express an arbitrary vector as a sum of unit vectors. Similarly, the use of Fourier series 
for the analysis of functions is a very powerful tool in analysis. These two ideas are essentially the same thing when you 
look at them as aspects of vector spaces. 

If the elements of the basis are denoted e), and a vector a is 


a = 


\ ^ 


a i e,- 


the numbers {a*} are called the components of a in the specified basis. Note that you don’t have to talk about 
orthogonality or unit vectors or any other properties of the basis vectors save that they span the space and they're 
independent. 

Example 1 is the prototype for the subject, and the basis usually chosen is the one designated x, y, (and z for 
three dimensions). Another notation for this is i, J, k — I'll use x-y. In any case, the two (or three) arrows are at right 
angles to each other. 

In example 5, the simplest choice of basis is 


e i = ( 1 0 0 ... 0 ) 
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e 2 = ( 0 1 0 ... 0 ) 


e n = (0 0 0 ... 1 ) (6.1) 

In example 6, if the domain of the functions is from — oo to +oo, a possible basis is the set of functions 

1pn(x) = X n e ~ x 2 / 2 . 

The major distinction between this and the previous cases is that the dimension here is infinite. There is a basis vector 
corresponding to each non-negative integer. It's not obvious that this is a basis, but it's true. 

If two vectors are equal to each other and you express them in the same basis, the corresponding components 
must be equal. 

= = b,j for all i (6.2) 

i i 

Suppose you have the relation between two functions of time 

A — Bu + 'ft = (dt (6.3) 

that is, that the two functions are the same, think of this in terms of vectors: on the vector space of polynomials in t a 
basis is 

Co = I? Ci = t , C2 = t 2 , etc. 

Translate the preceding equation into this notation. 

(A - Bu)e 0 + 7 ei = /3ei (6.4) 

For this to be valid the corresponding components must match: 

A — Bu = 0, and 7 = /? 


Differential Equations 

When you encounter differential equations such as 


m 


d 2 x . dx 


, ax 

+ b-rr + kx = 0, 


df 2 dt 


or 


7 


d 3 x 

W 


+ kt 2 ^- + ae x = 


dt 


0 , 


(6.5) 
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the sets of solutions to each of these equations form vector spaces. All you have to do is to check the axioms, and 
because of the theorem in section 6.3 you don't even have to do all of that. The solutions are functions, and as such 
they are elements of the vector space of example 2. All you need to do now is to verify that the sum of two solutions is 
a solution and that a constant times a solution is a solution. That’s what the phrase “linear, homogeneous” means. 

Another common differential equation is 


d 2 9 g 

W + e sm6 = 0 


This describes the motion of an undamped pendulum, and the set of its solutions do not form a vector space. The sum 
of two solutions is not a solution. 

The first of Eqs. (6.5) has two independent solutions, 

X\{t) = e _7t cosoj't, and x 2 {t) = sinuj't (6.6) 


where 7 = —b/ 2 m and u 1 = y ^ This is from Eq. (4.8). Any solution of this differential equation is a linear 

combination of these functions, and I can restate that fact in the language of this chapter by saying that X\ and X2 form 
a basis for the vector space of solutions of the damped oscillator equation. It has dimension two. 

The second equation of the pair (6.5) is a third order differential equation, and as such you will need to specify three 
conditions to determine the solution and to determine all the three arbitrary constants. In other words, the dimension of 
the solution space of this equation is three. 

In chapter 4 on the subject of differential equations, one of the topics was simultaneous differential equations, 
coupled oscillations. The simultaneous differential equations, Eq. (4.45), are 


(p X \ 

mi — p- = -hxt - k 3 (xi - x 2 ), 


d?xo 

and m 2 — p- = ~k 2 x 2 - k 3 (x 2 - 07 ) 


and have solutions that are pairs of functions. In the development of section 4.10 (at least for the equal mass, symmetric 
case), I found four pairs of functions that satisfied the equations. Now translate that into the language of this chapter, 
using the notation of column matrices for the functions. The solution is the vector 


\X2 lt) ) 


and the four basis vectors for this four-dimensional vector space are 


ei = 




e 2 = 


(g iuj\t 
g iuj\t 



e 3 = 


e iul2t 

_gia>2i 



e 4 = 


g 1 
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Any solution of the differential equations is a linear combination of these. In the original notation, you have Eq. (4.52). 
In the current notation you have 

J = A\ e*i + A 2 e *2 + A3 e% + A4 e 4 



6.5 Norms 

The “norm” or length of a vector is a particularly important type of function that can be defined on a vector space. It 
is a function, usually denoted by || ||, and that satisfies 

1. ||£f || > 0; ||£T|| = 0 if and only if v = O 

2. ||o!t; || = |a| ||f7|| 

3. |f7] +V 2 II < || Vi || + IIV 2 II ( the triangle inequality) The distance between two vectors V\ and v 2 is taken to be 

lltT -v 2 \\. 

6.6 Scalar Product 

The scalar product of two vectors is a scalar valued function of two vector variables. It could be denoted as f(u,v), but 
a standard notation for it is (u 1 v ). It must satisfy the requirements 

1. (w, (u + v )) = (w,u) + (u>,v) 

2 . (w,av) = a(w,v) 

3. (u,v)* = (v : u) 

4. (v,v)>0‘, and (fT, v ) = 0 if and only if v = O 

When a scalar product exists on a space, a norm naturally does too: 

\\v\\ = \J(v,v). (6.7) 

That this is a norm will follow from the Cauchy-Schwartz inequality. Not all norms come from scalar products. 

Examples 

Use the examples of section 6.3 to see what these are. The numbers here refer to the numbers of that section. 

1 A norm is the usual picture of the length of the line segment. A scalar product is the usual product of lengths times 
the cosine of the angle between the vectors. 


(u,v) = u-v = UV COS d. 


( 6 . 8 ) 
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4 A norm can be taken as the least upper bound of the magnitude of the function. This is distinguished from the 
“maximum" in that the function may not actually achieve a maximum value. Since it is bounded however, there is 
an upper bound (many in fact) and we take the smallest of these as the norm. On — oo < x < +oo, the function 
| tan -1 x | has 7r/2 for its least upper bound, though it never equals that number. 

5 A possible scalar product is 

n 

(fat, ■ ■ ■ , a n ), (6i, • • • , b n )) = ^ a* b k . (6.9) 

k = 1 

There are other scalar products for the same vector space, for example 

n 

((ai,...,a„), (&!,...,&„)) = ^2 k a* k b k (6.10) 

k = 1 

In fact any other positive function can appear as the coefficient in the sum and it still defines a valid scalar product. 
It's surprising how often something like this happens in real situations. In studying normal modes of oscillation the 
masses of different particles will appear as coefficients in a natural scalar product. 

I used complex conjugation on the first factor here, but example 5 referred to real numbers only. The reason for 
leaving the conjugation in place is that when you jump to example 14 you want to allow for complex numbers, and 
it's harmless to put it in for the real case because in that instance it leaves the number alone. 

For a norm, there are many possibilities: 

(1) |l(ai,...,a„)|l = y / EL^ 

(2) ||(ai,...,a„)|| = 2T =1 l a ^ (6.11) 

(3) ||(ai, . . . , a n )\\ =max£ =1 |a fc | 

(4) ||(ai,...,a„)|| = max| =1 k\a k \. 

The United States Postal Service prefers a variation on the second of these norms, see problem 8.45. 

6 A possible choice for a scalar product is 


{f,9)= dx f[x)* g(x). 

J a 


( 6 . 12 ) 


6 — Vector Spaces 


173 


9 


Scalar products and norms used here are just like those used for example 5. The difference is that the sums go from 
1 to infinity. The problem of convergence doesn't occur because there are only a finite number of non-zero terms. 


10 Take the norm to be 

|Kai,a 2 ,...)|| = ^r“ i |a t | 2 , 


(6.13) 


and this by assumption will converge. The natural scalar product is like that of example 5, but with the sum going 
out to infinity. It requires a small amount of proof to show that this will converge. See problem 6.19. 

11 A norm is ||t? || = l a il- There is no scalar product that will produce this norm, a fact that you can prove by 

using the results of problem 6.13. 


13 A natural norm is 



i/p 


dx \f(x)\ p 


(6.14) 


To demonstrate that this is a norm requires the use of some special inequalities found in advanced calculus books. 

15 If A and B are two matrices, a scalar product is (A, B} = Tr (A^ B), where f is the transpose complex conjugate 
of the matrix and Tr means the trace, the sum of the diagonal elements. Several possible norms can occur. One is 

\\A\\ = \Jti(A^ A). Another is the maximum value of \\Au ||, where u is a unit vector and the norm of u is taken 

to be [|wi| 2 H b \u n \ 2 ]^ 2 . 

19 A valid definition of a norm for the motions of a drumhead is its total energy, kinetic plus potential. How do you 
describe this mathematically? It’s something like 



I’ve left out all the necessary constants, such as mass density of the drumhead and tension in the drumhead. You 
can perhaps use dimensional analysis to surmise where they go. 

There is an example in criminal law in which the distinctions between some of these norms have very practical 
consequences. If you’re caught selling drugs in New York there is a longer sentence if your sale is within 1000 feet of 
a school. If you are an attorney defending someone accused of this crime, which of the norms in Eq. (6.11) would you 
argue for? The legislators who wrote this law didn't know linear algebra, so they didn't specify which norm they intended. 
The prosecuting attorney argued for norm #1, “as the crow flies,” but the defense argued that “crows don’t sell drugs” 
and humans move along city streets, so norm #2 is more appropriate. 
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The New York Court of Appeals decided that the Pythagorean norm (#1) is the appropriate one and they rejected 
the use of the pedestrian norm that the defendant advocated (#2). 
www.courts.state.ny.us/ctapps/decisions/nov05/162opn05.pdf 


6.7 Bases and Scalar Products 

When there is a scalar product, a most useful type of basis is the orthonormal one, satisfying 

(6.15) 

The notation 5ij represents the very useful Kronecker delta symbol. 

In the example of Eq. (6.1) the basis vectors are orthonormal with respect to the scalar product in Eq. (6.9). It is 
orthogonal with respect to the other scalar product mentioned there, but it is not in that case normalized to magnitude 
one. 

To see how the choice of even an orthonormal basis depends on the scalar product, try a different scalar product 
on this space. Take the special case of two dimensions. The vectors are now pairs of numbers. Think of the vectors as 
2x1 matrix column and use the 2x2 matrix 

2 1 
1 2 

Take the scalar product of two vectors to be 

((oi,a 2 ), (61,62)) = ^ ^ 2) (bl) = 2a * &1 + a * &2 + a 2 &1 + 2a 2 &2 (6.16) 


To show that this satisfies all the defined requirements for a scalar product takes a small amount of labor. The vectors 
that you may expect to be orthogonal, (1 0) and (0 1), are not. 

In example 6, if we let the domain of the functions be — L < x < +L and the scalar product is as in Eq. (6.12), 
then the set of trigonometric functions can be used as a basis. 


wkx 

and 

mKX 

sm — — 

lj 

cos — - — 

Ju 

n = 1, 2, 3, . . . 

and 

m = 0.1, 2, 3 


That a function can be written as a series 

f(x) = 


nnx 


= > a n sm 


+ ^2 b r 


mnx 


cos 


(6.17) 
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on the domain — L < x < +L is just an example of Fourier series, and the components of / in this basis are Fourier 
coefficients a \, . . . , bo, • • ■■ An equally valid and more succinctly stated basis is 

e nnix / L , n = 0, ±1, ±2, ... 


Chapter 5 on Fourier series shows many other choices of bases, all orthogonal, but not necessarily normalized. 

To emphasize the relationship between Fourier series and the ideas of vector spaces, this 
picture represents three out of the infinite number of basis vectors and part of a function that 
uses these vectors to form a Fourier series. 

. 1.7 tx 2 2 ttx 1 37 TX 

f{x) = 2 sm T + 3 sm T + 3 sm T + "' 

The orthogonality of the sines becomes the geometric term “perpendicular," and if you look at 
section 8.11, you will see that the subject of least square fitting of data to a sum of sine functions 
leads you right back to Fourier series, and to the same picture as here. 



6.8 Gram-Schmidt Orthogonalization 

From a basis that is not orthonormal, it is possible to construct one that is. This device is called the Gram-Schmidt 
procedure. Suppose that a basis is known (finite or infinite), V\, V2, ■ ■ ■ 

Step 1: normalize V\. e\ = V\/ yj 

Step 2: construct a linear combination of V\ and V2 that is orthogonal to V\ : 

Let e 2 o = i >2 — ei(ei,t? 2 ) and then normalize it. 


e*2 — 620/ (e2o, e2o) 


1/2 


(6.18) 


Step 3: Let e^o = V 3 — e) (ej , F 3 ) — < 3 * 2 ( 62 , ^ 3 ) etc. repeating step 2. 

What does this look like? See problem 6.3. 

6.9 Cauchy-Schwartz inequality 

For common three-dimensional vector geometry, it is obvious that for any real angle, cos 2 6 < 1. In terms of a dot 
product, this is \A-B\ < AB. This can be generalized to any scalar product on any vector space: 


(6.19) 
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The proof starts from a simple but not-so-obvious point. The scalar product of a vector with itself is by definition 
positive, so for any two vectors u and v you have the inequality 


(u — \v,u — \v) >0. 

where A is any complex number. This expands to 

(u,u) + |A| 2 (tf,t7 ) — A (u,v) — X*(v ,u) > 0. 


( 6 . 20 ) 


( 6 . 21 ) 


How much bigger than zero the left side is will depend on the parameter A. To find the smallest value that the left side 
can have you simply differentiate. Let A = x + iy and differentiate with respect to x and y, setting the results to zero. 
This gives (see problem 6.5) 

A = (v, u)/(v, v). (6.22) 

Substitute this value into the above inequality (6.21) 


(u,u) + 


K**0l 2 

(v,v) 


k**oi 2 

(v,v) 


K**0l 2 

(v,v) 


> 0 . 


(6.23) 


This becomes 

\(u,v)\ 2 < (u,u)(v,v) (6.24) 

This isn't quite the result needed, because Eq. (6.19) is written differently. It refers to a norm and I haven't established 
that the square root of (v,v) is a norm. When I do, then the square root of this is the desired inequality (6.19). 

For a couple of examples of this inequality, take specific scalar products. First the common directed line segments: 


(u,v)=u-v = uv cos#, so |t«;cos6 l | 2 < |ti| 2 |f| 2 


rb 

2 

rb 

rb 

/ dx f (x)*g(x) 

J a 

< 

/ dx\f(x )\ 2 

J a 

/ dx\g(x )\ 2 

J a 


The first of these is familiar, but the second is not, though when you look at it from the general vector space viewpoint 
they are essentially the same. 
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Norm from a Scalar Product 

The equation (6.7), ||t?|| = \J (v, v ) , defines a norm. Properties one and two for a norm are simple to check. (Do so.) 
The third requirement, the triangle inequality, takes a bit of work and uses the inequality Eq. (6.24). 

{vi + v 2 ,vi + v 2 ) = (vi, Vi) + (v 2 , v 2 ) + (vi,v 2 ) + (v 2 ,vi) 

< (vi,Vl) + ( V 2 ,V 2 ) + |(vi,v 2 )| + \{v 2 , Vi) I 
= (vi,vi) + (v 2 ,v 2 ) + 2\(v 1 ,v 2 )\ 

< (vi, Vi) + (v 2 , v 2 ) + 2 ^(v 1 ,v 1 )(v 2 ,v 2 ) 

= (j/{Vl,Vl) + \l{v2,V 2 )^j 

The first inequality is a property of complex numbers. The second one is Eq. (6.24). The square root of the last line is 
the triangle inequality, thereby justifying the use of (y, v) as the norm off; and in the process validating Eq. (6.19). 

||#L +V 2 W = \J(v l+V 2 ,Vi +V 2 ) < y/{vi,Vi) + \J(v 2 ,v 2 ) = ||t?i|| + ||tf 2 || (6.25) 

6.10 Infinite Dimensions 

Is there any real difference between the cases where the dimension of the vector space is finite and the cases where it's 
infinite? Yes. Most of the concepts are the same, but you have to watch out for the question of convergence. If the 
dimension is finite, then when you write a vector in terms of a basis v = Yh a k^k< the sum is finite and you don't even 
have to think about whether it converges or not. In the infinite-dimensional case you do. 

It is even possible to have such a series converge, but not to converge to a vector. If that sounds implausible, let 
me take an example from a slightly different context, ordinary rational numbers. These are the number m/n where m 
and n are integers ( n ^ 0). Consider the sequence 

1, 14/10, 141/100, 1414/1000, 14142/10000, 141421/100000, ... 

These are quotients of integers, but the limit is \J2 and that's not* a rational number. Within the confines of rational 
numbers, this sequence doesn’t converge. You have to expand the context to get a limit. That context is the real 

* Proof: If it is, then express it in simplest form as m/n = \f2 =>■ m? = 2?t 2 where m and n have no common 
factor. This equation implies that m must be even: m = 2m\. Substitute this value, giving 2 m\ = n 2 . That in turn 
implies that n is even, and this contradicts the assumption that the original quotient was expressed without common 
factors. 
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numbers. The same thing happens with vectors when the dimension of the space is infinite — in order to find a limit 
you sometimes have to expand the context and to expand what you’re willing to call a vector. 

Look at example 9 from section 6.3. These are sets of numbers (ai, a 2 , • • •) with just a finite number of non-zero 
entries. If you take a sequence of such vectors 

( 1 , 0 , 0 ,...), ( 1 , 1 , 0 , 0 ,...), ( 1 , 1 , 1 , 0 , 0 ,. ..),... 


Each has a finite number of non-zero elements but the limit of the sequence does not. It isn't a vector in the original 
vector space. Can I expand to a larger vector space? Yes, just use example 8, allowing any number of non-zero elements. 
For a more useful example of the same kind, start with the same space and take the sequence 

(1,0,...), (1, i/ 2 ,0,...), (1, 1/2, 1/3,0,...),... 


Again the limit of such a sequence doesn't have a finite number of entries, but example 10 will hold such a limit, because 

Eri«fc| 2 < 00- 

How do you know when you have a vector space without holes in it? That is, one in which these problems with 
limits don't occur? The answer lies in the idea of a Cauchy sequence. I'll start again with the rational numbers to 
demonstrate the idea. The sequence of numbers that led to the square root of two has the property that even though 
the elements of the sequence weren't approaching a rational number, the elements were getting close to each other. Let 
{r n }, n = 1 , 2, ... be a sequence of rational numbers. 


lim \r n — r m = 0 means 

n,m— >00 1 1 

For any e > 0 there is an N so that if both n and m are > N then I r n — r m I < e. 


(6.26) 


This property defines the sequence r n as a Cauchy sequence. A sequence of rational numbers converges to a real number 
if and only if it is a Cauchy sequence; this is a theorem found in many advanced calculus texts. Still other texts will take 
a different approach and use the concept of a Cauchy sequence to construct the definition of the real numbers. 

The extension of this idea to infinite dimensional vector spaces requires simply that you replace the absolute value 
by a norm, so that a Cauchy sequence is defined by lim n;m \\v n — v m \\ = 0. A “complete” vector space is one in which 
every Cauchy sequence converges. A vector space that has a scalar product and that is also complete using the norm 
that this scalar product defines is called a Hilbert Space. 

I don’t want to imply that the differences between finite and infinite dimensional vector spaces is just a technical 
matter of convergence. In infinite dimensions there is far more room to move around, and the possible structures that 
occur are vastly more involved than in the finite dimensional case. The subject of quantum mechanics has Hilbert Spaces 
at the foundation of its whole structure. 
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Exercises 

1 Determine if these are vector spaces with the usual rules for addition and multiplication by scalars. If not, which 
axiom(s) do they violate? 

(a) Quadratic polynomials of the form ax 2 + bx 

(b) Quadratic polynomials of the form ax 2 + bx + 1 

(c) Quadratic polynomials ax 2 + bx + c with a + b + c = 0 

(d) Quadratic polynomials ax 2 + bx + c with a + b + c = 1 

2 What is the dimension of the vector space of (up to) 5th degree polynomials having a double root at x = 1? 

3 Starting from three dimensional vectors (the common directed line segments) and a single fixed vector B, is the set 
of all vectors v with v-B = 0 a vector space? If so, what is it’s dimension? 

Is the set of all vectors v with v x B = 0 a vector space? If so, what is it’s dimension? 

4 The set of all odd polynomials with the expected rules for addition and multiplication by scalars. Is it a vector space? 

5 The set of all polynomials where the function "addition" is defined to be f 3 = f 2 + fi if the number fz(x) = 
fi(—x) + f 2 (—x). Is it a vector space? 

6 Same as the preceding, but for (a) even polynomials, (b) odd polynomials 

7 The set of directed line segments in the plane with the new rule for addition: add the vectors according to the usual 
rule then rotate the result by 10° counterclockwise. Which vector space axioms are obeyed and which not? 
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Problems 


6.1 Fourier series represents a choice of basis for functions on an interval. For suitably smooth functions on the interval 
0 to L, one basis is 


— 



(6.27) 


Use the scalar product (f,g) = [q f*(x)g{x) dx and show that this is an orthogonal basis normalized to 1, i.e. it is 
orthonormal. 


6.2 A function F(x) = x(L — x) between zero and L. Use the basis of the preceding problem to write this vector in 
terms of its components: 

OO 

F = ^a n e n . (6.28) 

i 

If you take the result of using this basis and write the resulting function outside the interval 0 < x < L, graph the result. 


6.3 For two dimensional real vectors with the usual parallelogram addition, interpret in pictures the first two steps of 
the Gram-Schmidt process, section 6.8. 

6.4 For two dimensional real vectors with the usual parallelogram addition, interpret the vectors u and v and the 
parameter A used in the proof of the Cauchy-Schwartz inequality in section 6.9. Start by considering the set of points 
in the plane formed by {u — AfT} as A ranges over the set of reals. In particular, when A was picked to minimize the 
left side of the inequality (6.21), what do the vectors look like? Go through the proof and interpret it in the context of 
these pictures. State the idea of the whole proof geometrically. 

Note: I don't mean just copy the proof. Put the geometric interpretation into words. 

6.5 Start from Eq. (6.21) and show that the minimum value of the function of A = x + iy is given by the value stated 
there. Note: this derivation applies to complex vector spaces and scalar products, not just real ones. Is this a minimum ? 

6.6 For the vectors in three dimensions, 


Vi = X + y, V 2 =y + Z, V;i = z + x 


use the Gram-Schmidt procedure to construct an orthonormal basis starting from V\ . Ans: e ?3 = (x — y + z)/y/ 3 
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6.7 For the vector space of polynomials in x, use the scalar product defined as 

(f,9) = f dxf{x)*g(x) 

(Everything is real here, so the complex conjugation won't matter.) Start from the vectors 

Vq = 1, Vi=X, V 2 =X 2 , V 3 = x 3 

and use the Gram-Schmidt procedure to construct an orthonormal basis starting from vq. Compare these results to 
the results of section 4.11. [These polynomials appear in the study of electric potentials and in the study of angular 
momentum in quantum mechanics: Legendre polynomials.] 

6.8 Repeat the previous problem, but use a different scalar product: 

r °° 2 

(f,g)= dxe x f{x)*g{x) 

J — OO 

[These polynomials appear in the study of the harmonic oscillator in quantum mechanics and in the study of certain 
waves in the upper atmosphere. With a conventional normalization they are called Hermite polynomials.] 

6.9 Consider the set of all polynomials in x having degree < N . Show that this is a vector space and find its dimension. 

6.10 Consider the set of all polynomials in x having degree < N and only even powers. Show that this is a vector space 
and find its dimension. What about odd powers only? 

6.11 Which of these are vector spaces? 

(a) all polynomials of degree 3 

(b) all polynomials of degree < 3 [Is there a difference between (a) and (b)?] 

(c) all functions such that /( 1) = 2/(2) 

(d) all functions such that /( 2) = /( 1) + 1 

(e) all functions satisfying f(x + 27 r) = f(x) 

(f) all positive functions 

(g) all polynomials of degree < 4 satisfying dxxf(x) = 0. 

(h) all polynomials of degree < 4 where the coefficient of x is zero. 

[Is there a difference between (g) and (h)?] 
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6.12 (a) For the common picture of arrows in three dimensions, prove that the subset of vectors v that satisfy A ■ v = 0 
for fixed A forms a vector space. Sketch it. 

(b) What if the requirement is that both A ■ v = 0 and B ■ v = 0 hold. Describe this and sketch it. 

6.13 If a norm is defined in terms of a scalar product, ||t?|| = yj (v, v), it satisfies the “parallelogram identity” (for real 
scalars), 

\\u + v || 2 + || u - v\\ 2 = 2\\u || 2 + 2||tT|| 2 . (6.29) 

6.14 If a norm satisfies the parallelogram identity, then it comes from a scalar product. Again, assume real scalars. 
Consider combinations of || , u + 'F'|| 2 , \\u — v\\ 2 and construct what ought to be the scalar product. You then have to 
prove the four properties of the scalar product as stated at the start of section 6.6. Numbers four and three are easy. 
Number one requires that you keep plugging away, using the parallelogram identity (four times by my count). 

Number two is downright tricky; leave it to the end. If you can prove it for integer and rational values of the constant 
a, consider it a job well done. I used induction at one point in the proof. The final step, extending a to all real values, 
requires some arguments about limits, and is typically the sort of reasoning you will see in an advanced calculus or 
mathematical analysis course. 

6.15 Modify the example number 2 of section 6.3 so that fa = fi + f 2 means fs(x ) = fi(x — a) + f- 2 (x — b) for fixed 
a and b. Is this still a vector space? 

6.16 The scalar product you use depends on the problem you’re solving. The fundamental equation (5.15) started from 
the equation u" = Xu and resulted in the scalar product 

[b 

(u 2 ,ui) = / dxu 2 (x)*u 1 (x ) 

J a 

Start instead from the equation u” = A w(x)u and see what identity like that of Eq. (5.15) you come to. Assume w 
is real. What happens if it isn’t? In order to have a legitimate scalar product in the sense of section 6.6, what other 
requirements must you make about w7 

6.17 The equation describing the motion of a string that is oscillating with frequency c o about its stretched equilibrium 
position is 

as ( r <*>s) = -^ {x)v 
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Here, y(x) is the sideways displacement of the string from zero; T(x) is the tension in the string (not necessarily a 
constant); fj,(x) is the linear mass density of the string (again, it need not be a constant). The time-dependent motion 
is really y(x) cos(ut + (j>), but the time dependence does not concern us here. As in the preceding problem, derive the 
analog of Eq. (5.15) for this equation. For the analog of Eq. (5.16) state the boundary conditions needed on y and 
deduce the corresponding orthogonality equation. This scalar product has the mass density for a weight. 

Ans: [T(x)(y[y* 2 - yiy* 2 ')) b a = ( u * 2 2 - uf) / a & y{x)y* 2 yi dx 

6.18 The way to define the sum in example 17 is 

V \f(x)\ 2 = limithe sum of \ f(x)\ 2 for those x where \f(x)\ 2 > c > 0}. (6.30) 

z ' c ->0 

X 

This makes sense only if for each c > 0, \f(x)\ 2 is greater than c for just a finite number of values of x. Show that the 
function 

tt x \ = / 1 /n for x = l/n 

\ 0 otherwise 

is in this vector space, and that the function f(x) = x is not. What is a basis for this space? [Take 0 < x < 1] This is 
an example of a vector space with non-countable dimension. 

6.19 In example 10, it is assumed that Jj)]’ 0 |a^| 2 < oo. Show that this implies that the sum used for the scalar product 

also converges: a*, b^. [Consider the sums |a& + ib ^ | 2 , J2 I a k ~ ib ^ | 2 , I a k + bk\ 2 , and J2 \ a k — bj^ 2 , allowing 

complex scalars.] 

6.20 Prove strictly from the axioms for a vector space the following four theorems. Each step in your proof must 
explicitly follow from one of the vector space axioms or from a property of scalars or from a previously proved theorem. 

(a) The vector O is unique. [Assume that there are two, 0\ and 0 2 - Show that they’re equal. First step: use axiom 4.] 

(b) The number 0 times any vector is the zero vector: Of? = O. 

(c) The vector v' is unique. 

(d) (—l)v = v' . 

6.21 For the vector space of polynomials, are the two functions {1 + x 2 , x + x 3 } linearly independent? 

6.22 Find the dimension of the space of functions that are linear combinations of 

{1, sinx, cosx, sin 2 x, cos 2 x, sin 4 x, cos 4 a:, sin 2 x cos 2 x} 
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• 4 

■ 3 

■ 2 
■ 1 
- 0 
■ -1 

-2 


6.23 A model vector space is formed by drawing equidistant parallel lines in a plane and labelling adjacent lines by 
successive integers from oo to +oo. Define multiplication by a (real) scalar so that multiplication of the vector by a 
means multiply the distance between the lines by l /a. Define addition of two vectors by finding the intersections of the 
lines and connecting opposite corners of the parallelograms to form another set of parallel lines. The resulting lines are 
labeled as the sum of the two integers from the intersecting lines. (There are two choices here, if one is addition, what is 
the other?) Show that this construction satisfies all the requirements for a vector space. Just as a directed line segment 
is a good way to picture velocity, this construction is a good way to picture the gradient of a function. In the vector 
space of directed line segments, you pin the vectors down so that they all start from a single point. Here, you pin them 
down so that the lines labeled “zero” all pass through a fixed point. Did I define how to multiply by a negative scalar? 
If not, then you should. This picture of vectors is developed extensively in the text “Gravitation” by Misner, Wheeler, 
and Thorne. 

6.24 In problem 6.11 (g), find a basis for the space. Ans: 1, x, 2>x — 5x 3 . 

6.25 What is the dimension of the set of polynomials of degree less than or equal to 10 and with a triple root at x = 1? 

6.26 Verify that Eq. (6.16) does satisfy the requirements for a scalar product. 

6.27 A variation on problem 6.15: fa = j\ + f 2 means 

(a) / 3 (a;) = Afi(x — a) + Bf 2 (x — b ) for fixed a, b, A, B. For what values of these constants is this a vector space? 

(b) Now what about / 3 (x) = fi{x 3 ) + /^(rc 3 )? 

6.28 Determine if these are vector spaces: 

(1) Pairs of numbers with addition defined as (rti, X 2 ) + (t/i, t/ 2 ) = (xi + t/ 2 , £2 + t/i) and multiplication by scalars as 

c(Xi , X 2 ) = ( CXi,CX 2 )■ 
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(2) Like example 2 of section 6.3, but restricted to those / such that f(x) > 0. (real scalars) 

(3) Like the preceding line, but define addition as (/ + g){x) = f(x)g(x) and ( cf)(x ) = ( f(x)) c . 

6.29 Do the same calculation as in problem 6.7, but use the scalar product 

(f,g)= [ x 2 dx f*{x)g{x) 

Jo 


6.30 Show that the following is a scalar product. 


(f,g)= / dx[f*(x)g(x) + \f*'{x)g'(x)\ 

J a 

where A is a constant. What restrictions if any must you place on A? The name Sobolev is associated with this scalar 
product. 

6.31 (a) With the scalar product of problem 6.29, find the angle between the vectors 1 and x. Here the word angle 
appears in the sense of A ■ B = AB cosd. (b) What is the angle if you use the scalar product of problem 6.7? (c) With 
the first of these scalar products, what combination of 1 and x is orthogonal to 1? Ans: 14.48° 

6.32 In the online text linked on the second page of this chapter, you will find that section two of chapter three has 
enough additional problems to keep you happy. 

6.33 Show that the sequence of rational numbers a n = Y^k= i l/& is not a Cauchy sequence. What about XlILi 1 /& 2 ? 

6.34 In the vector space of polynomials of the form ax + f3x 3 , use the scalar product ( f,g ) = dx f (x)* g(x) and 

construct an orthogonal basis for this space. Ans: One pair is X , X ^ X . 

6.35 You can construct the Chebyshev polynomials by starting from the successive powers, x n , n = 0, 1, 2, . . . and 
applying the Gram-Schmidt process. The scalar product in this case is 


</w> 



f(x)*g(x) 

y/l - X 2 


The conventional normalization for these polynomials is T n ( 1) = 1, so you should not try to make the norm of the 
resulting vectors one. Construct the first four of these polynomials, and show that these satisfy T n (cos0) = cos (n6). 
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These polynomials are used in numerical analysis because they have the property that they oscillate uniformly between 
— 1 and +1 on the domain — 1 < x < 1. Verify that your results for the first four polynomials satisfy the recurrence 
relation: T n+ \{x) = 2 xT n (x) — T n _i(x). Also show that cos ((n + 1)9 ) = 2cos#cos ( n6 ) — cos ((n — 1)8). 


6.36 In spherical coordinates (9,(j)), the angle 6 is measured from the o-axis, and the function fi(9,(f)) = cos 9 can be 
written in terms of rectangular coordinates as (section 8.8) 


fi(8,(j)) = cos 9 = 


z 

■sj x 2 + y 2 + z 2 


Pick up the function f\ and rotate it by 90° counterclockwise about the positive y- axis. Do this rotation in terms of 
rectangular coordinates, but express the result in terms of spherical coordinates: sines and cosines of 9 and q 1. Call it 
/2. Draw a picture and figure out where the original and the rotated function are positive and negative and zero. 

Now pick up the same f\ and rotate it by 90° clockwise about the positive x-axis, again finally expressing the result in 
terms of spherical coordinates. Call it / 3. 

If now you take the original /1 and rotate it about some random axis by some random angle, show that the resulting 
function f 4 is a linear combination of the three functions /1, /2, and f 3 . I.e., all these possible rotated functions 

form a three dimensional vector space. Again, calculations such as these are much easier to demonstrate in rectangular 
coordinates. 

6.37 Take the functions /1, /2, and f 3 from the preceding problem and sketch the shape of the functions 


re r fi(9,(p), re r f 2 {0,(f>), re r f 3 {9,(j)) 


To sketch these, picture them as defining some sort of density in space, ignoring the fact that they are sometimes 
negative. You can just take the absolute value or the square in order to visualize where they are big or small. Use 
dark and light shading to picture where the functions are big and small. Start by finding where they have the largest 
and smallest magnitudes. See if you can find similar pictures in an introductory chemistry text. Alternately, check out 
winter.group.shef.ac.uk/orbitron/ 

6.38 Use the results of problem 6.17 and apply it to the Legendre equation Eq. (4.55) to demonstrate that the Legendre 
polynomials obey j/ 1 dx P n (x)P m (x) = 0 if n ^ m. Note: the function T(x) from problem 6.17 is zero at these 
endpoints. That does not imply that there are no conditions on the functions y\ and t/2 at those endpoints. The product 
of T(x)y[y 2 has to vanish there. Use the result stated just after Eq. (4.59) to show that only the Legendre polynomials 
and not the more general solutions of Eq. (4.58) work. 
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6.39 Using the result of the preceding problem that the Legendre polynomials are orthogonal, show that the equation 
(4.62)(a) follows from Eq. (4.62)(e). Square that equation (e) and integrate f^dx. Do the integral on the left and 
then expand the result in an infinite series in t. On the right you have integrals of products of Legendre polynomials, 
and only the squared terms are non-zero. Equate like powers of t and you will have the result. 


6.40 Use the scalar product of Eq. (6.16) and construct an orthogonal basis using the Gram-Schmidt process and starting 


from 


and 


. Verify that your answer works in at least one special case. 


6.41 For the differential equation x + x = 0, pick a set of independent solutions to the differential equation — any ones 
you like. Use the scalar product (f,g) = f c ]dxf(x)*g{x) and apply the Gram-Schmidt method to find an orthogonal 
basis in this space of solutions. Is there another scalar product that would make this analysis simpler? Sketch the 
orthogonal functions that you found. 


Operators and Matrices 


You've been using operators for years even if you've never heard the term. Differentiation falls into this category; so does 
rotation; so does wheel-alignment. In the subject of quantum mechanics, familiar ideas such as energy and momentum 
will be represented by operators. You probably think that pressure is simply a scalar, but no. It’s an operator. 

7.1 The Idea of an Operator 

You can understand the subject of matrices as a set of rules that govern certain square or rectangular arrays of numbers — 
how to add them, how to multiply them. Approached this way the subject is remarkably opaque. Who made up these 
rules and why? What's the point? If you look at it as simply a way to write simultaneous linear equations in a compact 
way, it's perhaps convenient but certainly not the big deal that people make of it. It is a big deal. 

There’s a better way to understand the subject, one that relates the matrices to more fundamental ideas and that 
even provides some geometric insight into the subject. The technique of similarity transformations may even make a 
little sense. This approach is precisely parallel to one of the basic ideas in the use of vectors. You can draw pictures 
of vectors and manipulate the pictures of vectors and that’s the right way to look at certain problems. You quickly 
find however that this can be cumbersome. A general method that you use to make computations tractable is to write 
vectors in terms of their components, then the methods for manipulating the components follow a few straight-forward 
rules, adding the components, multiplying them by scalars, even doing dot and cross products. 

Just as you have components of vectors, which are a set of numbers that depend on your choice of basis, matrices 
are a set of numbers that are components of — not vectors, but functions (also called operators or transformations or 
tensors). I’ll start with a couple of examples before going into the precise definitions. 

The first example of the type of function that I'll be interested in will be a function defined on the two-dimensional 
vector space, arrows drawn in the plane with their starting points at the origin. The function that I'll use will rotate each 
vector by an angle a counterclockwise. This is a function, where the input is a vector and the output is a vector. 

f{v i + v 2 ) 


Vi + V2 
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What happens if you change the argument of this function, multiplying it by a scalar? You know f(v), what is 
f(cv )? Just from the picture, this is c times the vector that you got by rotating v. What happens when you add two 
vectors and then rotate the result? The whole parallelogram defining the addition will rotate through the same angle a, 
so whether you apply the function before or after adding the vectors you get the same result. 

This leads to the definition of the word linearity: 


f(cv) = cf(v ), and f(v i+v 2 ) = f(v 1 ) + f{v 2 ) 


(7.1) 


Keep your eye on this pair of equations! They're central to the whole subject. 

Another example of the type of function that I'll examine is from physics instead of mathematics. A rotating rigid 
body has some angular momentum. The greater the rotation rate, the greater the angular momentum will be. Now how 
do I compute the angular momentum assuming that I know the shape and the distribution of masses in the body and 
that I know the body's angular velocity? The body is made of a lot of point masses (atoms), but you don’t need to go 
down to that level to make sense of the subject. As with any other integral, you start by dividing the object in to a lot 
of small pieces. 

What is the angular momentum of a single point mass? It starts from basic Newtonian mechanics, and the equation 
F = dp/dt. (It's better in this context to work with this form than with the more common expressions F = ma.) Take 
the cross product with r, the displacement vector from the origin. 


r x F = r x dp/ dt 


Add and subtract the same thing on the right side of the equation (add zero) to get 


^ ^ dp dr dr 

rxt =rx — + -rr x V 77 X P 

dt dt dt 


d 

“aWp, 


dr 

dt 


x p 


Now recall that p is rrw, and v = dr/dt, so the last term in the preceding equation is zero because you are taking the 
cross product of a vector with itself. This means that when adding and subtracting a term from the right side above, I 
was really adding and subtracting zero. 

rxF is the torque applied to the point mass m and rxp is the mass's angular momentum about the origin. Now 
if there are many masses and many forces, simply put an index on this torque equation and add the resulting equations 
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over all the masses in the rigid body. The sums on the left and the right provide the definitions of torque and of angular 
momentum. 

Total = ^ X = J t J2 x Pk) = ^ 

k k 


For a specific example, attach two masses to the ends of a light rod and attach that rod to a second, vertical one 
as sketched — at an angle. Now spin the vertical rod and figure out what the angular velocity and angular momentum 
vectors are. Since the spin is along the vertical rod, that specifies the direction of the angular velocity vector u to be 
upwards in the picture. (Viewed from above everything is rotating counter-clockwise.) The angular momentum of one 
point mass is fxp = rx rnv. The mass on the right has a velocity pointing into the page and the mass on the left has 
it pointing out. Take the origin to be where the supporting rod is attached to the axis, then rxp for the mass on the 
right is pointing up and to the left. For the other mass both r and p are reversed, so the cross product is in exactly the 
same direction as for the first mass. The total angular momentum the sum of these two parallel vectors, and it is not in 
the direction of the angular velocity. 

f\ x rri i v i 



tv(in) 

mi 


Now make this quantitative and apply it to a general rigid body. There are two basic pieces to the problem: the 
angular momentum of a point mass and the velocity of a point mass in terms of its angular velocity. The position of 
one point mass is described by its displacement vector from the origin, r. Its angular momentum is then r x p, where 
p = rnv. If the rigid body has an angular velocity vector uj, the linear velocity of a mass at coordinate f is Cu x r. 
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The total angular momentum of a rotating set of masses m k at respective coordinates r k is the sum of all the 
individual pieces of angular momentum 

L = x rn k'^ki and since v k = 63 x r k , 

k (7.2) 

L = ' r k x x r k ) 

k 

If you have a continuous distribution of mass then using an integral makes more sense. For a given distribution of mass, 
this integral (or sum) depends on the vector 63. It defines a function having a vector as input and a vector L as output. 
Denote the function by I, so L = 1(63). 


L = I dmfx (63 xf) = I (to) (7.3) 

This function satisfies the same linearity equations as Eq. (7.1). When you multiply 63 by a constant, the output, 

L is multiplied by the same constant. When you add two 63’s together as the argument, the properties of the cross 

product and of the integral guarantee that the corresponding L's are added. 

/(ecu) = cl(uj), and I(£ui +UJ2) = /(a?i) + I (£2) 

This function / is called the “inertia operator” or more commonly the “inertia tensor." It's not simply multiplication 
by a scalar, so the rule that appears in an introductory course in mechanics ( L = Iu) is valid only in special cases, for 
example those with enough symmetry. 

Note: / is not a vector and L is not a function. L is the output of the function / when you feed it the argument 
63. This is the same sort of observation appearing in section 6.3 under “Function Spaces.” 

If an electromagnetic wave passes through a crystal, the electric field will push the electrons around, and the 

bigger the electric field, the greater the distance that the electrons will be pushed. They may not be pushed in the 

same direction as the electric field however, as the nature of the crystal can make it easier to push the electrons in one 
direction than in another. The relation between the applied field and the average electron displacement is a function 
that (for moderate size fields) obeys the same linearity relation that the two previous functions do. 


P = a(E) 
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P is the electric dipole moment density and E is the applied electric field. The function a is called the polarizability. 

If you have a mass attached to six springs that are in turn attached to six walls, the mass will 
come to equilibrium somewhere. Now push on this mass with another (not too large) force. The mass 
will move, but will it move in the direction that you push it? If the six springs are all the same it 
will, but if they’re not then the displacement will be more in the direction of the weaker springs. The 
displacement, d, will still however depend linearly on the applied force, F. 

7.2 Definition of an Operator 

An operator, also called a linear transformation, is a particular type of function. It is first of all, a vector valued function 
of a vector variable. Second, it is linear; that is, if A is such a function then A(v ) is a vector, and 

A(av i + /3v 2 ) = aA(vi) + fdA{v 2 ). (7.4) 

The domain is the set of variables on which the operator is defined. The range is the set of all values put out by the 
function. Are there nonlinear operators? Yes, but not here. 

7.3 Examples of Operators 

The four cases that I started with, rotation in the plane, angular momentum of a rotating rigid body, polarization of a 
crystal by an electric field, and the mass attached to some springs all fit this definition. Other examples: 

5. The simplest example of all is just multiplication by a scalar: A(v ) = cv for all v. This applies to any vector space 
and its domain is the entire space. 

6. On the vector space of all real valued functions on a given interval, multiply any function / by 1 + x 2 : ( Af)(x ) = 
(1 + x 2 )f{x). The domain of A is the entire space of functions of x. This is an infinite dimensional vector space, 
but no matter. There’s nothing special about 1 + x 2 , and any other function will do to define an operator. 

7. On the vector space of square integrable functions [ f dx\f(x)\ 2 < oo] on a < x < b, define the operator as 
multiplication by x. The only distinction to make here is that if the interval is infinite, then xf(x) may not itself 
be square integrable. The domain of this operator in this case is therefore not the entire space, but just those 
functions such that xf(x) is also square-integrable. On the same vector space, differentiation is a linear operator: 
( Af)(x ) = f'{x). This too has a restriction on the domain: It is necessary that f also exist and be square 
integrable. 

8. On the vector space of infinitely differentiable functions, the operation of differentiation, d/dx, is itself a linear 
operator. It’s certainly linear, and it takes a differentiable function into a differentiable function. 

So where are the matrices? This chapter started by saying that I'm going to show you the inside scoop on matrices and 
so far I’ve failed to produce even one. 
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When you describe vectors you can use a basis as a computational tool and manipulate the vectors using their 
components. In the common case of three-dimensional vectors we usually denote the basis in one of several ways 

% 3, k, or x, y, z , or e\, e 2 , e 3 

and they all mean the same thing. The first form is what you see in the introductory physics texts. The second form is 
one that you encounter in more advanced books, and the third one is more suitable when you want to have a compact 
index notation. It's that third one that I'll use here; it has the advantage that it doesn’t bias you to believe that you 
must be working in three spatial dimensions. The index could go beyond 3, and the vectors that you're dealing with may 
not be the usual geometric arrows. (And why does it have to start with one? Maybe I want the indices 0, 1, 2 instead.) 
These need not be perpendicular to each other or even to be unit vectors. 

The way to write a vector v in components is 

v = v x x + v y y + v z z, or Vi e\ + v 2 e 2 + v 3 e 3 = ^ v k e k (7.5) 

k 

Once you’ve chosen a basis, you can find the three numbers that form the components of that vector. In a similar 
way, define the components of an operator, only that will take nine numbers to do it (in three dimensions). If you 
evaluate the effect of an operator on any one of the basis vectors, the output is a vector. That’s part of the definition 
of the word operator. This output vector can itself be written in terms of this same basis. The defining equation for the 
components of an operator / is 

(7.6) 

For each input vector you have the three components of the output vector. Pay careful attention to this equation! 
It is the defining equation for the entire subject of matrix theory, and everything in that subject comes from this one 
innocuous looking equation. (And yes if you’re wondering, I wrote the indices in the correct order.) 

Why? 

Take an arbitrary input vector for f:u = f(v). Both u and v are vectors, so write them in terms of the basis 
chosen. 

u = J2 u k?k = f(v) = /(X/^ej) = J2 v if(ci) 
k i i 



(7.7) 
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The last equation is the result of the linearity property, Eq. (7.1), already assumed for /. Now pull the sum and the 
numerical factors V{ out in front of the function, and write it out. It is then clear: 

fiyxei + v 2 e 2 ) = f(viei) + f{v 2 e 2 ) =v 1 f(ei) + v 2 f{e 2 ) 

Now you see where the defining equation for operator components comes in. Eq. (7.7) is 


'y ^ 'U’k 'y ^ Vi ^ ^ fki ^k 

k i k 


For two vectors to be equal, the corresponding coefficients of e\, e 2 , etc. must match; their respective components must 
be equal, and this is 

u k = J2 V if ki ’ usually written u k = J2fki v i (7.8) 

i i 

so that in the latter form it starts to resemble what you may think of as matrix manipulation. / row , column is the 
conventional way to write the indices, and multiplication is defined so that the following product means Eq. (7.8). 


(u\ \ ( hi f 12 fi3 \ f v i \ 

( U 2 ] = ( f 2 i f 2 2 f 23 ] [ V 2 1 

\U 3 ) \f 3 1 fll2 /33 / \V 3 J 


(7.9) 


( 



( fn — frz — frt\ 

/21 /22 /23 1 

\ /3I /32 /33 / 



is Ml = fuVi + f\ 2 v 2 + / 13 V 3 etc. 


And this is the reason behind the definition of how to multiply a matrix and a column matrix. The order in which the 
indices appear is the conventional one, and the indices appear in the matrix as they do because I chose the order of the 
indices in a (seemingly) backwards way in Eq. (7.6). 


Components of Rotations 

Apply this to the first example, rotate all vectors in the plane through the angle a. I don't want to keep using the same 
symbol / for every function, so I'll call this function R instead, or better yet R a - R a (v) is the rotated vector. Pick two 
perpendicular unit vectors for a basis. You may call them x and y, but again I'll call them e) and e 2 . Use the definition 
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of components to get 



-Rq?(c i) — y ' 

k 

Ra(c 2 ) ^ ' Rk2^k 

k 


The rotated e) has two components, so 

R a (e\) = e*i cos cr + e*2 sin a = + -R21T2 

This determines the first column of the matrix of components, 

i?n=coso;, and i?2i = sina 
Similarly the effect on the other basis vector determines the second column: 

Raifi'. 2) = e 2 cos a - e\ sin a = R\ 2 e± + R22 e 2 

Check: R a (e 1 ) ■ R a (e 2 ) = 0. 


The component matrix is then 


-R 12 = — sin cr, and R 22 = cos cr 

{Ra) = 


cos a — sm a 
sin a cos a 


(7.10) 


(7.11) 


(7.12) 


(7.13) 


Components of Inertia 

The definition, Eq. (7.3), and the figure preceding it specify the inertia tensor as the function that relates the angular 
momentum of a rigid body to its angular velocity. 


Use the vector identity, 


L = J dm r x (u x r ) = I ( u ) 
Ax (B xC) = B(A-C)-C{A-B) 


(7.14) 

(7.15) 
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then the integral is 


L= dm \ uj(f ■ r) — r (6 j ■ f )] = I (uj) 


(7.16) 


Pick the common rectangular, orthogonal basis and evaluate the components of this function. Equation (7.6) says 

r — xei + ye 2 + ze 3 so 


I{&i) — ^ ' I hi f'k 


I(e 1) = I dm [ei{x 2 + y 2 + z 2 ) - (xe\ + ye 2 + ze 3 )(x)\ 

= In ei + hi e*2 + I-.il e*3 

from which In — j dm(y 2 + z 2 ), hi = — J dmyx , I31 = — J dmzx 


This provides the first column of the components, and you get the rest of the components the same way. The whole 
matrix is 

/ O O 

-xy —xz 

dm ( — xy x 2 + z 2 —yz 

-xz — yz x 2 + y 2 


(7.17) 


These are the components of the tensor of inertia. The diagonal elements of the matrix may be familiar; they 
are the moments of inertia, x 2 + y 2 is the perpendicular distance-squared to the z-axis, so the element J 33 (= I zz ) is 
the moment of inertia about that axis, f dmr j_. The other components are less familiar and are called the products of 
inertia. This particular matrix is symmetric: hj = Ij{. That's a special property of the inertia tensor. 

Components of Dumbbell 

Look again at the specific case of two masses rotating about an axis. Do it quantitatively. 


r 2 x m 2 V[ 




\* \ 


r 1 xmitj 


*> vi (' n ) 
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The integrals in Eq. (7.17) are simply sums this time, and the sums have just two terms. I’m making the 
approximation that these are point masses. Make the coordinate system match the indicated basis, with x right and y 
up, then z is zero for all terms in the sum, and the rest are 


/ 


dm ( y 2 + z 2 ) 



J dm (a: 2 + z 2 ) 
J dm (x 2 + y 2 ) 


m\r\ cos 2 a + m^r\ cos 2 a 
—m\r\ cos cr sin cr — ?7i2'r 2 cos a sin cr 
m\r\ sin 2 a + rr^r 2 sin 2 a 
m\r\ + mxr^ 


The matrix is then 

(cos 2 a — cos a sin a 0 \ 
— cos a sin a sin 2 a 0 

0 0 1 J 

Don’t count on all such results factoring so nicely. 

In this basis, the angular velocity u has just one component, so what is LI 


(mpr 2 + m- 2 r 2 ) — 


cos 2 a 
cos a sin a 
0 


sin 2 a 


cos cr sin cr 0 \ / 0 

o ;Jlr = 

( m\r\ + m2r 2 ) 


— oj cos a sin a 
to sin 2 a 
0 


Translate this into vector form: 


L = (mir 2 + m- 2 r 2 )uj sin a ( — ej cos cr + e*2 sin a) 


(7.18) 


(7.19) 


When a = 90°, then cosa = 0 and the angular momentum points along the y-axis. This is the symmetric special case 
where everything lines up along one axis. Notice that if a = 0 then everything vanishes, but then the masses are both 


7 — Operators and Matrices 


198 


on the axis, and they have no angular momentum. In the general case as drawn, the vector L points to the upper left, 
perpendicular to the line between the masses. 


Parallel Axis Theorem 

When you know the tensor of inertia about one origin, you can relate the result to the tensor about a different origin. 
The center of mass of an object is 


r cm = jj / rdm 


(7.20) 


where M is the total mass. Compare the operator / using an origin at the center of mass to / about another origin. 



1(0) = I dm f x (uj x r) = j dm[r-f cm + f cm ] x (uj x [f- f cm + r cm ] ) 

= I dm [f— r cm ] x (uj x [r — r cm ]) + J dmr cm x (uj x r cm ) + two cross terms 
The two cross terms vanish, problem 7.17. What’s left is 

1(0) = J dm [r- r cm ] x (uj x [f - f cm ]) + M r cm x (uj x f cm ) 

= I cm (0 ) + Mr cm x (uj x f cm ) 


(7.21) 


(7.22) 


Put this in words and it says that the tensor of inertia about any point is equal to the tensor of inertia about the center 
of mass plus the tensor of inertia of a point mass M placed at the center of mass. 

As an example, place a disk of mass M and radius R and uniform mass density so that its center is at (x,y,z) = 
(R, 0, 0) and it is lying in the x-y plane. Compute the components of the inertia tensor. First get the components about 
the center of mass, using Eq. (7.17). 
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The integrals such as 


— J dmxy , — J dmyz 


are zero. For fixed y each positive value of x has a corresponding negative value to make the integral add to zero. It is 
odd in x (or t/); remember that this is about the center of the disk. Next do the J 33 integral. 


dm (x 2 + y 2 ) = dmr 2 = / dA r 2 


For the element of area, use dA = 27 tr dr and you have 


r M f R , o M R A 1 „ r 
I33 = — =^r / dr 2nr J = — ^ 27 c— = -MR~ 


nR 2 4 2 


For the next two diagonal elements, 


In = j dm(y 2 + z 2 ) = J dmy 2 and I22 = j dm(x 2 + z 2 ) = j dmx 2 

Because of the symmetry of the disk, these two are equal, also you see that the sum is 

/11 + I22 = J dmy 2 + J dmx 2 = J 33 = ]^MR 2 

This saves integration. In = I22 = MR 2 / 4 . 


( 7 . 23 ) 
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For the other term in the sum (7.22), you have a point mass at the distance R along the x-axis, (x, y, z) = (. R , 0, 0). 
Substitute this point mass into Eq. (7.17) and you have 

/O 0 0 \ 

M ( 0 R 2 0 
\0 0 R 2 ) 

The total about the origin is the sum of these two calculations. 

/ 1/4 0 0 \ 

MR 2 0 5/4 0 

\ 0 0 3/2 J 

Why is this called the parallel axis theorem when you’re translating a point (the origin) and not an axis? Probably 
because this was originally stated for the moment of inertia alone and not for the whole tensor. In that case you have 
only an axis to deal with. 

Components of the Derivative 

The set of all polynomials in x having degree < 2 forms a vector space. There are three independent vectors that I can 
choose to be 1, x, and x 2 . Differentiation is a linear operator on this space because the derivative of a sum is the sum 
of the derivatives and the derivative of a constant times a function is the constant times the derivative of the function. 
With this basis I’ll compute the components of d/dx. Start the indexing for the basis from zero instead of one because 
it will cause less confusion between powers and subscripts. 

e 0 = 1, 6*i = x, e 2 = x 2 


By the definition of the components of an operator — I'll call this one D, 




D (^i) = ^x = 1 = e 0 , 


d 


Die-)) = x 2 = 2x = 2e*i 

dx 


These define the three columns of the matrix. 


0 1 0 
(D) = | 0 0 2 
0 0 0 


d,T 2 

check: — = 2x is 

dx 



There's nothing here about the basis being orthonormal. It isn’t. 
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7.4 Matrix Multiplication 

How do you multiply two matrices? There’s a rule for doing it, but where does it come from? 

The composition of two functions means you first apply one function then the other, so 

h = fog means h(v) = f(g(v )) (7.24) 

I’m assuming that these are vector-valued functions of a vector variable, but this is the general definition of composition 
anyway. If / and g are linear, does it follow the h is? Yes, just check: 

h(cv) = f(g(cv )) = f(cg(v )) = cf(g(v)), and 
h(vi + v 2 ) = f(g{v i + v 2 )) = f{g{vi) + g(v 2 )) = f{g(vi)) + f{g(v 2 )) 

What are the components of hi Again, use the definition and plug in. 

MA) = h ki 4 = / (^(e*)) = /(X^4) = ^gjifiej) = ^gji^2fkjek (7-25) 

k j j j k 

and now all there is to do is to equate the corresponding coefficients of e^. 


hki 


/W gjif kj 
j 


or more conventionally 


hki 


f kjSji 
j 


(7.26) 


This is in the standard form for matrix multiplication, recalling the subscripts are ordered as f rc for row-column. 


//tn 

h 2 i 

\h 31 


hi2 h\ 3 \ 

h 3 2 h 23 I 

h 32 h 3 3 J 


The computation of h 32 from Eq. (7.26) is 


//tn 

h 21 

\h 31 


hj 2 /ll3 \ 

h 22 h 23 J 
/t 3 2 h 33 J 


/ll 

/12 

/l3 

/21 

fs 2 

/23 

/31 

fz 2 

/33 

fn- 

fi ‘2 



/21 /22 /23 
, /31 /32 /33 


/ , 9 i 1 
.921 
V .93 1 


/ 911 
.921 
V .931 


hi 2 — ,/i 1 .9 12 + /1 2.922 + fi 3 g 32 


g 12 

932 

932 


913 \ 
923 I 
933/ 


.912 

922 

932 


913 \ 
923 I 
933/ 


(7.27) 
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Matrix multiplication is just the component representation of the composition of two functions, Eq. (7.26), and 
there's nothing here that restricts this to three dimensions. In Eq. (7.25) I may have made it look too easy. If you try 
to reproduce this without looking, the odds are that you will not get the indices to match up as nicely as you see there. 
Remember: When an index is summed it is a dummy, and you are free to relabel it as anything you want. You can use 
this fact to make the indices come out neatly. 

Composition of Rotations 

In the first example, rotating vectors in the plane, the operator that rotates every vector by the angle a has components 

(R\ = f cosa ~sma\ ( 7 .2 8 ) 

v ’ \ sin a cos a J 


What happens if you do two such transformations, one by a and one by /3? The result better be a total rotation by 
a + /3. One function, Rp is followed by the second function R a and the composition is 

R a+/3 RaR/3 

This is mirrored in the components of these operators, so the matrices must obey the same equation. 

( cos(cr + /3) — sin(ct + /3)\ _ ( cos cr — sincA ( cos/3 — sin/3 

^sh^ctT/?) cos(a + (3) ) ^sinct cos cr J ^sin/3 cos/3 

Multiply the matrices on the right to get 

( cos a cos /3 — sin a sin /3 — cos a sin /3 — sin a cos /3 
V sin a cos /3 + cos a sin / 3 cos a cos f3 — sin a sin /3 


(7.29) 


The respective components must agree, so this gives an immediate derivation of the formulas for the sine and cosine of 
the sum of two angles. Cf. Eq. (3.8) 

7.5 Inverses 

The simplest operator is the one that does nothing. f(v ) = v for all values of the vector v. This implies that f(e i) = e) 
and similarly for all the other elements of the basis, so the matrix of its components is diagonal. The 2x2 matrix is 
explicitly the identity matrix 



or in index notation 


— 


1 (if i=j) 
0 (if i^j) 


(7.30) 
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and the index notation is completely general, not depending on whether you’re dealing with two dimensions or many 
more. Unfortunately the words “inertia" and “identity” both start with the letter “I” and this symbol is used for both 
operators. Live with it. The <5 symbol in this equation is the Kronecker delta — very handy. 

The inverse of an operator is defined in terms of Eq. (7.24), the composition of functions. If the composition of 
two functions takes you to the identity operator, one function is said to be the inverse of the other. This is no different 
from the way you look at ordinary real valued functions. The exponential and the logarithm are inverse to each other 
because* 

ln(e x ) = x for all x. 

For the rotation operator, Eq. (7.10), the inverse is obviously going to be rotation by the same angle in the opposite 
direction. 

R a R- a = I 

Because the matrix components of these operators mirror the original operators, this equation must also hold for the 
corresponding components, as in Eqs. (7.27) and (7.29). Set j3 = —a in (7.29) and you get the identity matrix. 

In an equation such as Eq. (7.7), or its component version Eqs. (7.8) or (7.9), if you want to solve for the vector 
u, you are asking for the inverse of the function /. 


u = f(v) implies v = f 1 (u) 


The translation of these equations into components is Eq. (7.9) 

( u A = (f* 

\u 2 ) v/21 iW 


which implies 


1 ( fii 

/11/22 - /12/21 \-f21 fu ) \u 2 ) 



(7.31) 


The verification that these are the components of the inverse is no more than simply multiplying the two matrices and 
seeing that you get the identity matrix. 


* The reverse, e lnx works just for positive x, unless you recall that the logarithm of a negative number is complex. 
Then it works there too. This sort of question doesn't occur with finite dimensional matrices. 
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7.6 Rotations, 3-d 

In three dimensions there are of course more basis vectors to rotate. Start by rotating vectors about the axes and it is 
nothing more than the two-dimensional problem of Eq. (7.10) done three times. You do have to be careful about signs, 
but not much more — as long as you draw careful pictures! 




The basis vectors are drawn in the three pictures: e\ = x, e *2 = y, e ! 3 = z. 

In the first sketch, rotate vectors by the angle a about the x-axis. In the second case, rotate by the angle (3 about 
the y- axis, and in the third case, rotate by the angle 7 about the z-axis. In the first case, the e\ is left alone. The e *2 
picks up a little positive e* 3 , and the e *3 picks up a little negative e *2 . 


R a ei (ei) = ei, R a ei (^ 2 ) = e 2 cos a + e 3 sin cr, R a g 1 (e 3 ) = e 3 cos a - e 2 sin a 


(7.32) 


Here the notation R g represents the function prescribing a rotation by 6 about the axis pointing along 6. These equations 
are the same as Eqs. (7.11) and (7.12). 

The corresponding equations for the other two rotations are now easy to write down: 

R-l3e 2 (ei ) = cos [3 - e-s sin fj, Rpg 2 (e 2 ) = &, Rpg 2 (es) = ei sin^ + e 3 cos^ (7.33) 

R y g 3 (ei) = ei cos 7 + e 2 sin 7 , g 3 (e 2 ) = — ei sin 7 + e 2 cos 7 , R y g 3 (e 3 ) = e 3 (7.34) 

From these vector equations you immediate read the columns of the matrices of the components of the operators as in 
Eq. (7.6). 


( 


1 

0 

0 


(Raei) 

0 0 
cos a — sin a 
sin a cos a 


(R/3 &2 ) 

/ cos / 3 0 

° 1 
y — sin f3 0 


sin/3 

0 

cos (3 



\Rye3) 
cos 7 — sin 7 0 

sin 7 cos 7 


0 


0 


(7.35) 
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As a check on the algebra, did you see if the rotated basis vectors from any of the three sets of equations (7.32)-(7.34) 
are still orthogonal sets? 

Do these rotation operations commute? No. Try the case of two 90° rotations to see. Rotate by this angle about 
the x-axis then by the same angle about the //-axis. 


(-^e* 27 r/ 2 ) C^ei 71-/2) 


/ 0 0 1 \ / 1 0 0 \ 
0 1000-1 
\-l 0 0/ \0 1 0 / 



In the reverse order, for which the rotation about the y - axis is done first, these are 


(-^ei7r/2) (-^e 2 7r/2) 


/1 0 0 W 0 0 i\ 
0 0-10 10 
\0 1 0 / \-l 0 0/ 



0 

0 

1 



(7.36) 


(7.37) 


Translate these operations into the movement of a physical object. Take the same x-y-z coordinate system as in 
this section, with x pointing toward you, y to your right and z up. Pick up a book with the cover toward you so that 
you can read it. Now do the operation Rg in / 2 on it so that the cover still faces you but the top is to your left. Next do 
Rg on / 2 and the book is face down with the top still to your left. See problem 7.57 for and algebraic version of this. 

Start over with the cover toward you as before and do Rg 27V / 2 so that the top is toward you and the face is down. 
Now do the other operation Rg l7V / 2 and the top is toward you with the cover facing right — a different result. Do these 
physical results agree with the matrix products of the last two equations? For example, what happens to the vector 
sticking out of the cover, initially the column matrix (1 0 0)? This is something that you cannot simply read. You 

have to do the experiment for yourself. 


7.7 Areas, Volumes, Determinants 

In the two-dimensional example of arrows in the plane, look what happens to areas when an operator acts. The unit 
square with corners at the origin and (0, 1), (1, 1), 1.0) gets distorted into a parallelogram. The arrows from the origin 
to every point in the square become arrows that fill out the parallelogram. 
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What is the area of this parallelogram? 

I'll ask a more general question. (It isn’t really, but it looks like it.) Start with any region in the plane, and say it 
has area A\. The operator takes all the vectors ending in this area into some new area of a size A 2 , probably different 
from the original. What is the ratio of the new area to the old one? A 2 /A 1 . How much does this transformation stretch 
or squeeze the area? What isn’t instantly obvious is that this ratio of areas depends on the operator alone, and not on 
how you chose the initial region to be transformed. If you accept this for the moment, then you see that the question in 
the previous paragraph, which started with the unit square and asked for the area into which it transformed, is the same 
question as finding the ratio of the two more general areas. (Or the ratio of two volumes in three dimensions.) See the 
end of the next section for a proof. 

This ratio is called the determinant of the operator. 

The first example is the simplest. Rotations in the plane, R a . Because rotations leave area unchanged, this 
determinant is one. For almost any other example you have to do some work. Use the component form to do the 
computation. The basis vector e) is transformed into the vector fne 1 + / 2 i ?2 with a similar expression for the image 
of e* 2 - You can use the cross product to compute the area of the parallelogram that these define. For another way, see 
problem 7.3. This is 

(/nei + /2ie 2 ) x (/i2e*i + /22e 2 ) = (/11/22 - /2i/i2)e 3 ( 7 . 38 ) 

The product in parentheses is the determinant of the transformation. 

det(/) = / 11/22 - / 21/12 (7.39) 


What if I had picked a different basis, maybe even one that isn't orthonormal? From the definition of the determinant it 
is a property of the operator and not of the particular basis and components you use to describe it, so you must get the 
same answer. But will the answer be the same simple formula (7.39) if I pick a different basis? Now that's a legitimate 
question. The answer is yes, and that fact will come out of the general computation of the determinant in a moment. 
[What is the determinant of Eq. (7.13)?] 
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The determinant can be either positive or negative. That tells you more than simply how the transformation alters 
the area; it tells you whether it changes the orientation of the area. If you place a counterclockwise loop in the original 
area, does it remain counterclockwise in the image or is it reversed? In three dimensions, the corresponding plus or minus 
sign for the determinant says that you're changing from a right-handed set of vectors to a left-handed one. What does 
that mean? Make an x-y-z coordinate system out of the thumb, index finger, and middle finger of your right hand. Now 
do it with your left hand. You cannot move one of these and put it on top of the other (unless you have very unusual 
joints). One is a mirror image of the other. 

The equation (7.39) is a special case of a rule that you've probably encountered elsewhere. You compute the 
determinant of a square array of numbers by some means such as expansion in minors or Gauss reduction. Here I've 
defined the determinant geometrically, and it has no obvious relation the traditional numeric definition. They are the 
same, and the reason for that comes by looking at how the area (or volume) of a parallelogram depends on the vectors 
that make up its sides. The derivation is slightly involved, but no one step in it is hard. Along the way you will encounter 
a new and important function: A. 

Start with the basis e±, e 2 and call the output of the transformation V\ = f(e i) and v 2 = /(e 2 )- The final area 
is a function of these last two vectors, call it A(v\,v 2 ^, and this function has two key properties: 


A(v,v)=0, and A(t7i, av 2 + /3v 3 ) = aA(vi, v 2 ) + /3A(v 1: v 3 ) 


(7.40) 


That the area vanishes if the two sides are the same is obvious. That the area is a linear function of the vectors forming 
the two sides is not so obvious. (It is linear in both arguments.) Part of the proof of linearity is easy: 

A(t?i, av 2 ) = aA(y 1 ,v 2 ) 


simply says that if one side of the parallelogram remains fixed and the other changes by some factor, then the area 
changes by that same factor. For the other part, A(t?i, v 2 +^ 3 ) , start with a picture and see if the area that this function 
represents is the same as the sum of the two areas made from the vectors V\Szv 2 and V\ &F3. 

V\&lv 2 form the area OCBA. V1&1V3 form the area OCED. 
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HDF = JEG 
HDO = JEC 
so 

HJGF = area DEGF = area OCBA 
OCJH = area OCED 
add these equations: 

OCGF = area OCBA + area OCED 

The last line is the statement that sum of the areas of the two parallelograms is the area of the parallelogram formed 
using the sum of the two vectors: 

A (vi ,v 2 + v 3 ) = A (vi , v 2 ) + A (Hi , v 3 ) 

This sort of function A, generalized to three dimensions, is characterized by 

(1) A(av! + /3v(,v 2 ,v 3 ) = aA(vi,v 2 ,v 3 ) + (3A(v(,v 2 ,v 3 ) 

(2) A(v u v u v 3 )=Q (7.41) 

It is linear in each variable, and it vanishes if any two arguments are equal. I've written it for three dimensions, but in 
N dimensions you have the same equations with N arguments, and these properties hold for all of them. 

Theorem: Up to an overall constant factor, this function is unique. 

An important result is that these assumptions imply the function is antisymmetric in any two arguments. Proof: 



area 

area 


area 


A(t7i + v 2 , V\ + V 2 ,V 3 ) = 0 = A(vi,V!,v 3 ) +A(v 1 ,v 2 ,v 3 ) +A(v 2 ,v 1 ,v 3 ) +A(v 2 ,v 2 ,v 3 ) 

This is just the linearity property. Now the left side, and the 1 st and 4 th terms on the right, are zero because two 
arguments are equal. The rest is 

A(ul, v 2 , v 3 ) + A(v 2 , vi,v 3 ) = 0 (7.42) 

and this says that interchanging two arguments of A changes the sign. (The reverse is true also. Assume antisymmetry 
and deduce that it vanishes if two arguments are equal.) 
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I said that this function is unique up to a factor. Suppose that there are two of them: A and A'. Now show for 
some constant a, that A — ah' is identically zero. To do this, take three independent vectors and evaluate the number 
h'(y a , v b , v c ) There is some set of v’s for which this is non-zero, otherwise A' is identically zero and that’s not much 
fun. Now consider 

h(v a ,V b ,V c ) , , 

a = — and define Ao = A — ah 
h'(v a ,v b ,v c ) 

This function Ao is zero for the special argument: (v a ,v b ,v c ), and now I'll show why it is zero for all arguments. That 
means that it is the zero function, and says that the two functions A and A' are proportional. 

The vectors (v a ,v b ,v c ) are independent and there are three of them (in three dimensions). They are a basis. You 
can write any vector as a linear combination of these. E.g. 

V\ = Av a + Bv\ b and v 2 = Cv a + Dv b and 


Put these (and let’s say v c ) into Aq. 


Ao {vi,V 2 , Vc) = ACh 0 (v a ,v a ,v c ) + ADh 0 (v a ,v b ,v c ) + BCh 0 (v b ,v a ,v c ) + BDh 0 (v b ,v b ,v c ) 


All these terms are zero. Any argument that you put into Ao is a linear combination of v a , v b , and v c , and that means 
that this demonstration extends to any set of vectors, which in turn means that Ao vanishes for any arguments. It is 
identically zero and that implies A and A' are, up to a constant overall factor, the same. 

In N dimensions, a scalar-valued function of N vector variables, linear in each 
argument and antisymmetric under interchanging any pairs of arguments, is unique 
up to a factor. 

I've characterized this volume function A by two simple properties, and surprisingly enough this is all you need 
to compute it in terms of the components of the operator! With just this much information you can compute the 
determinant of a transformation. 

Recall: V\ has for its components the first column of the matrix for the components of /, and V2 forms the second 
column. Adding any multiple of one vector to another leaves the volume alone. This is 


h(v 1 ,v 2 + avi,v 3l ) = h(vi,V 2 ,v 3 ) + ah(vi,vi,v 3 ) (7.43) 

and the last term is zero. Translate this into components. Use the common notation for a determinant, a square array 
with vertical bars, but forget that you know how to compute this symbol! I'm going to use it simply as a notation by 
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keep track of vector manipulations. The numerical value will come out at the end as the computed value of a volume. 

Vi = /(e)) = Ej fjiej. then A(vi,v 2 ,v 3 ) = A(ui, v 2 + avi, v 3 ) = 


/ll 

fn + afu 

/l3 


/ll 

fl2 

fl3 


/ll 

/ll 

/13 


/ll 

fl2 

/l3 

/21 

/22 + tt/21 

/23 

= 

/21 

f22 

/23 

+ CK 

/21 

/21 

/23 

= 

/21 

f22 

/23 

/31 

f?,2 + tt/31 

/33 


/31 

fs2 

/33 


/31 

/31 

/33 


/31 

/32 

/33 


To evaluate this object, simply choose a to make the element f\ 2 + afu = 0. Then repeat the operation, adding a 
multiple of the first column to the third, making the element fi 3 + /3fu = 0. This operation doesn’t change the original 
value of A(t?i, v 2 , v 3 ) ■ 


A(t7i, v 2 + avi, v 3 + /3vi) 


/ll 

0 

0 


/ll 

0 

0 

/21 

/22 + tt/21 

/23 + /5/21 

= 

/21 

f '22 

/23 

/31 

/32 + tt/31 

/33 + ///ll 


/31 

f$2 

f!$3 


Repeat the process to eliminate adding 'yv^ to the third argument, where 7 = ~f 23 / f 22 - 


fn 

0 

0 


fn 

0 

0 


/ll 

0 

0 

/21 

f’22 

f l 23 

= 

/21 

f‘22 

f'23 + 7/22 

= 

/21 

f‘22 

0 

/31 

/32 

/33 


/31 

f'32 

/S3 + 7/32 


/31 

f'32 

f" 

J33 


(7.44) 


Written in the last form, as a triangular array, the final result for the determinant does not depend on the elements / 21 , 
/ 31 , f 32 - They may as well be zero. Why? Just do the same sort of column operations, but working toward the left. 
Eliminate f 3 i and f' i2 by adding a constant times the third column to the first and second columns. Then eliminate f 2 \ 
by using the second column. You don’t actually have to do this, you just have to recognize that it can be done so that 
you can ignore the lower triangular part of the array. 

Translate this back to the original vectors and A is unchanged: 


X(vi,v 2 ,v 3 ) = A(/iiei,/2 2 e2,/33e 3 ) = /11/22/33 A (ei, e 2 , e 3 ) 

The volume of the original box is A(e\, e 2 , e 3 ) , so the quotient of the new volume to the old one is 

det = /11/22/33 (7-45) 

The fact that A is unique up to a constant factor doesn’t matter. Do you want to measure volume in cubic feet, cubic 
centimeters, or cubic light-years? This algorithm is called Gauss elimination. It’s development started with the geometry 
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and used vector manipulations to recover what you may recognize from elsewhere as the traditional computed value of 
the determinant. 

Did I leave anything out in this computation of the determinant? Yes, one point. What if in Eq. (7.44) the 
number f 22 = 0? You can't divide by it then. You can however interchange any two arguments of A, causing simply 
a sign change. If this contingency occurs then you need only interchange the two columns to get a component of zero 
where you want it. Just keep count of such switches whenever they occur. 


Trace 

There's a property closely related to the determinant of an operator. It's called the trace. If you have an operator /, 
then consider the determinant of M = I + ef, where / is the identity. This combination is very close to the identity if e 
is small enough, so its determinant is very close to one. How close? The first order in e is called the trace of /, or more 
formally 


Tr(/) = ^det (/ + £/) 


(7.46) 


Express this in components for a two dimensional case, and 




=4> det (/ + ef) = det 


1 + ea eb 
ec 1 + ed 


(1 + ea)(l + ed) — e 2 bc 


(7.47) 


The first order coefficient of e is a + d, the sum of the diagonal elements of the matrix. This is the form of the result 
in any dimension, and the proof involves carefully looking at the method of Gauss elimination for the determinant, 
remembering at every step that you're looking for only the first order term in e. See problem 7.53. 

7.8 Matrices as Operators 

There's an important example of a vector space that I’ve avoided mentioning up to now. Example 5 in section 6.3 is the 
set of n-tuples of numbers: (ai,ci 2 , . . . ,a n ). I can turn this on its side, call it a column matrix, and it forms a perfectly 
good vector space. The functions (operators) on this vector space are the matrices themselves. 

When you have a system of linear equations, you can translate this into the language of vectors. 


ax + by = e and 


cx + dy = / 




Solving for x and y is inverting a matrix. 
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There's an aspect of this that may strike you as odd. This matrix is an operator on the vector space of column 
matrices. What are the components of this operator? What? Isn't the matrix a set of components already? That 
depends on your choice of basis. Take an example 


M = 

Compute the components as usual. 


1 2 
3 4 


Me i = 


with basis 


1 2 
3 4 


ei = 


e 2 = 


= lei + 3e 2 


This says that the first column of the components of M in this basis are 


select a different basis. 

Again compute the component. 

Me i = 
Me 2 = 


What else would you expect? Now 


1 2 
3 4 

1 2 
3 4 


ei = 


e 2 = 


= 5 


1 

-1 


= 5ei — 2e 2 


-1 

-1 


= -lei 


The components of M in this basis are 

It doesn't look at all the same, but it represents the same operator. Does this matrix have the same determinant, using 
Eq. (7.39)? 

Determinant of Composition 

If you do one linear transformation followed by another one, that is the composition of the two functions, each operator 
will then have its own determinant. What is the determinant of the composition? Let the operators be F and G. One 
of them changes areas by a scale factor det (F) and the other ratio of areas is det(G). If you use the composition of the 
two functions, FG or GF, the overall ratio of areas from the start to the finish will be the same: 



det(FG) = det(F) ■ det(G) = det(G) ■ det(F) = det (GF) 


(7.48) 
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Recall that the determinant measures the ratio of areas for any input area, not just a square; it can be a parallelogram. 
The overall ratio of the product of the individual ratios, det(F) det(G). The product of these two numbers is the total 
ratio of a new area to the original area and it is independent of the order of F and G, so the determinant of the 
composition of the functions is also independent of order. 

Now what about the statement that the definition of the determinant doesn't depend on the original area that 
you start with. To show this takes a couple of steps. First, start with a square that's not at the origin. You can always 
picture it as a piece of a square that is at the origin. The shaded square that is 1/16 the area of the big square goes 
over to a parallelogram that’s 1/ 16 the area of the big parallelogram. Same ratio. 



An arbitrary shape can be divided into a lot of squares. That's how you do an integral. The image of the whole 
area is distorted, but it retains the fact that a square that was inside the original area will become a parallelogram that 
is inside the new area. In the limit as the number of squares goes to infinity you still maintain the same ratio of areas 
as for the single original square. 
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7.9 Eigenvalues and Eigenvectors 

There is a particularly important basis for an operator, the basis in which the components form a diagonal matrix. Such 
a basis almost always* exists, and it's easy to see from the definition as usual just what this basis must be. 

N 

f(G) = Yl fkiCk 

k = 1 


* See section 7.12. 
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To be diagonal simply means that = 0 for all i yf k, and that in turn means that all but one term in the sum 
disappears. This defining equation reduces to 

f (e*j) = fuei (with no sum this time) (7.49) 

This is called an eigenvalue equation. It says that for any one of these special vectors, the operator / on it returns a 
scalar multiple of that same vector. These multiples are called the eigenvalues, and the corresponding vectors are called 
the eigenvectors. The eigenvalues are then the diagonal elements of the matrix in this basis. 

The inertia tensor is the function that relates the angular momentum of a rigid body to its angular velocity. The 
axis of rotation is defined by those points in the rotating body that aren’t moving, and the vector Co lies along that 
line. The angular momentum is computed from Eq. (7.3) and when you’ve done all those vector products and integrals 
you can’t really expect the angular momentum to line up with Co unless there is some exceptional reason for it. As the 
body rotates around the Co axis, L will be carried with it, making L rotate about the direction of c o. The vector L is 
time-dependent and that implies there will be a torque necessary to keep it going, r = dL/dtt. Because L is rotating with 
frequency u, this rotating torque will be felt as a vibration at this rotation frequency. If however the angular momentum 
happens to be parallel to the angular velocity, the angular momentum will not be changing; dL/ctt = 0 and the torque 
T = dL/dt will be zero, implying that the vibrations will be absent. Have you ever taken your car in for servicing and 
asked the mechanic to make the angular momentum and the angular velocity vectors of the wheels parallel? It’s called 
wheel-alignment. 

How do you compute these eigenvectors? Just move everything to the left side of the preceding equation. 

fifii) ~ fii^i = 0, or (f~ful)ei = 0 
/ is the identity operator, output equals input. This notation is cumbersome. I'll change it. 

f(v) = \v (/ — A I)v = 0 (7.50) 

A is the eigenvalue and v is the eigenvector. This operator (/ — XI) takes some non-zero vector into the zero vector. 
In two dimensions then it will squeeze an area down to a line or a point. In three dimensions it will squeeze a volume 
down to an area (or a line or a point). In any case the ratio of the final area (or volume) to the initial area (or volume) 
is zero. That says the determinant is zero, and that's the key to computing the eigenvectors. Figure out which A's will 
make this determinant vanish. 

Look back at section 4.9 and you’ll see that the analysis there closely parallels what I'm doing here. In that case I 
didn’t use the language of matrices or operators, but was asking about the possible solutions of two simultaneous linear 
equations. 



ax + by = 0 


and cx + dy = 0, 


or 
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The explicit algebra there led to the conclusion that there can be a non-zero solution (x,y) to the two equations only if 
the determinant of the coefficients vanishes, ad — be = 0, and that’s the same thing that I’m looking for here: a non-zero 
vector solution to Eq. (7.50). 

Write the problem in terms of components, and of course you aren't yet in the basis where the matrix is diagonal. 
If you were, you’re already done. The defining equation is f(v) = Xv, and in components this reads 

( fll fl2 fl3 \ f v 1 \ l Vi \ 

J2fki v i = Xv ki or hi h'2 hs \ \V2 = A v 2 

i V hi hz hs ) \ v 3 / V t’3 / 


Here I arbitrarily wrote the equation for three dimensions. That will change with the problem. Put everything on the 
left side and insert the components of the identity, the unit matrix. 


( f n /l2 ,/l3 \ / 1 0 0 \ 

[hi S 22 /23 I - A 0 1 0 

\ /31 /32 /33 / \ 0 0 1 / 




(7.51) 


The one way that this has a non-zero solution for the vector v is for the determinant of the whole matrix on the left-hand 
side to be zero. This equation is called the characteristic equation of the matrix, and in the example here that’s a cubic 
equation in A. If it has all distinct roots, no double roots, then you’re guaranteed that this procedure will work, and you 
will be able to find a basis in which the components form a diagonal matrix. If this equation has a multiple root then 
there is no guarantee. It may work, but it may not; you have to look closer. See section 7.12. If the operator has certain 
symmetry properties then it's guaranteed to work. For example, the symmetry property found in problem 7.16 is enough 
to insure that you can find a basis in which the matrix for the inertia tensor is diagonal. It is even an orthogonal basis 
in that case. 

Example of Eigenvectors 

To keep the algebra to a minimum, I’ll work in two dimensions and will specify an arbitrary but simple example: 


f(e 1 ) = 2e\ + e 2 , /(e 2 ) = 2e 2 + e\ 


with components 


M 




The eigenvalue equation is, in component form 


2 

1 



A 




-A 





(7.52) 


vi 

V2 


or 


(7.53) 
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The condition that there be a non-zero solution to this is 


det 


2 1 
1 2 


-A 


1 0 
0 1 


= 0 = (2 - A) 2 - 1 


The solutions to this quadratic are A = 1. 3. For these values then, the apparently two equation for the two unknowns 
V\ and v 2 are really one equation. The other is not independent. Solve this single equation in each case. Take the first 
of the two linear equations for and V 2 as defined by Eq. (7.53). 


2v\ + v 2 = Xvi 

A = 1 implies v 2 = —Vi, A = 3 implies 
The two new basis vectors are then 


v 2 = Vi 


e[ = (ei - e 2 ) and e' 2 = (ei + e 2 ) (7.54) 

and in this basis the matrix of components is the diagonal matrix of eigenvalues. 

1 0 
0 3 

If you like to keep your basis vectors normalized, you may prefer to say that the new basis is (e\ — ef 2 )/\/2 and (ei+e 2 )/\/2. 
The eigenvalues are the same, so the new matrix is the same. 

Example: Coupled Oscillators 

Another example drawn from physics: Two masses are connected to a set of springs and fastened between two rigid 
walls. This is a problem that appeared in chapter 4, Eq. (4.45). 

mid 2 x\ / dt 2 = —k\X\ — k 3 (x i — x 2 ), and m 2 d 2 x 2 /dt 2 = —k 2 x 2 — k 3 {x 2 — xi) 

The exponential form of the solution was 

xi (t) = Ae lut , x 2 (t) = Be lLOt 

The algebraic equations that you get by substituting these into the differential equations are a pair of linear 
equations for A and B, Eq. (4.47). In matrix form these equations are, after rearranging some minus signs, 

(ki+k 3 -fc 3 A / A A _ 2 / mi 0 A f A A 
V ~h k 2 + h)\B) \ 0 m 2 )\B) 
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You can make it look more like the previous example with some further arrangement 


f ki+ k :i -k 3 0 \l/A\_/0\ 

V ~fa k 2 + k 3 ) { 0 m 2 )\\B) \ 0 y 


The matrix on the left side maps the column matrix to zero. That can happen only if the matrix has zero determinant 
(or the column matrix is zero). If you write out the determinant of this 2x2 matrix you have a quadratic equation in 
u; 2 . It's simple but messy, so rather than looking first at the general case, look at a special case with more symmetry. 
Take m\ = m 2 = m and k\ = k 2 . 


det 


f k\ + k 3 
{ -h 


-h \ 
k\ + k 3 ) 




= 0 = (k\ + k 3 - imu 2 ) 2 


k 


2 

3 


This is now so simple that you don't even need the quadratic formula; it factors directly. 

( ki +k 3 — moo 2 — k 3 ) ( k\ +k 3 — mu 2 + £ 3 ) = 0 


The only way that the product of two numbers is zero is if one of the numbers is zero, so either 


k\ — mu: 2 = 0 or 


k\ + 2 k 3 — mu 2 = 0 


This determines two possible frequencies of oscillation. 


cut 



and 


co 2 = 


k\ + 2k 3 


m 


You're not done yet; these are just the eigenvalues. You still have to find the eigenvectors and then go back to apply 
them to the original problem. This is F = ma after all. Look back to section 4.10 for the development of the solutions. 

7.10 Change of Basis 

In many problems in physics and mathematics, the correct choice of basis can enormously simplify a problem. Sometimes 
the obvious choice of a basis turns out in the end not to be the best choice, and you then face the question: Do you start 
over with a new basis, or can you use the work that you’ve already done to transform everything into the new basis? 

For linear transformations, this becomes the problem of computing the components of an operator in a new basis 
in terms of its components in the old basis. 
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First: Review how to do this for vector components, something that ought to be easy to do. The equation (7.5) 
defines the components with respect to a basis, any basis. If I have a second proposed basis, then by the definition of 
the word basis, every vector in that second basis can be written as a linear combination of the vectors in the first basis. 
I'll call the vectors in the first basis, e) and those in the second basis e'-, for example in the plane you could have 

6] = x, e 2 =y, and e[=2x + 0.5y : e' 2 =0.5x + 2 y (7.55) 

Each vector e'- is a linear combination* of the original basis vectors: 

e' i = S{e i ) = Y J S ji e j (7.56) 

j 

This follows the standard notation of Eq. (7.6); you have to put the indices in this order in order to make the notation 
come out right in the end. One vector expressed in two different bases is still one vector, so 

v = J2 v 'A = J2 v ^i 

i i 

and I'm using the fairly standard notation of v'- for the f th component of the vector v with respect to the second basis. 
Now insert the relation between the bases from the preceding equation (7.56). 


v = 




and this used the standard trick of changing the last dummy label of summation from i to j so that it is easy to compare 
the components. 


SjiVj = Vj or in matrix notation ( S)(v ') = ( v ), =^> (v r ) = (S) 1 (v) 

i 

* There are two possible conventions here. You can write e r - in terms of the e), calling the coefficients Sji, or you 
can do the reverse and call those components Sji . [e*j = S'(e^)] Naturally, both conventions are in common use. The 
reverse convention will interchange the roles of the matrices S and S ~ 1 in what follows. 
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Similarity Transformations 

Now use the definition of the components of an operator to get the components in the new basis. 

j 

/ ( E ■V7) = E s af O) = E ■ sa E = E /J. E ■V* 

3 3 3 k j k 

The final equation comes from the preceding line. The coefficients of e). must agree on the two sides of the equation. 

E s } j bJ = Y.fjAi 

3 3 

Now rearrange this in order to place the indices in their conventional row , column order. 


[Sn 

\S 21 


Sl 2 \ 

S22) 


3 3 


f[ 1 fn 

/ 2 I /22 


hi /12 \ 

/21 /22 J 


(Sn 

\S 2 1 


Sl2\ 

S 22 ) 


In turn, this matrix equation is usually written in terms of the inverse matrix of S, 


(S)(f) = c f)(S) is (/') = (S)-\f)(S) 


(7.57) 


(7.58) 


and this is called a similarity transformation. For the example Eq. (7.55) this is 

e\ =2x + 0.5 y = Sue 1 + S 2 ie 2 

which determines the first column of the matrix (S'), then e' 2 determines the second column. 


(■ S) = 


0.5 


0.5 


then 


(S)- 1 = 


1 / 2 -0.5 

3.75 I -0.5 2 


Eigenvectors 

In defining eigenvalues and eigenvectors I pointed out the utility of having a basis in which the components of an operator 
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form a diagonal matrix. Finding the non-zero solutions to Eq. (7.50) is then the way to find the basis in which this 
holds. Now I’ve spent time showing that you can find a matrix in a new basis by using a similarity transformation. Is 
there a relationship between these two subjects? Another way to ask the question: I've solved the problem to find all 
the eigenvectors and eigenvalues, so what is the similarity transformation that accomplishes the change of basis (and 
why is it necessary to know it if I already know that the transformed, diagonal matrix is just the set of eigenvalues, and 
I already know them)? 

For the last question, the simplest answer is that you don’t need to know the explicit transformation once you 
already know the answer. It is however useful to know that it exists and how to construct it. If it exists — I’ll come 
back to that presently. Certain manipulations are more easily done in terms of similarity transformations, so you ought to 
know how they are constructed, especially because almost all the work in constructing them is done when you've found 
the eigenvectors. 

The equation (7.57) tells you the answer. Suppose that you want the transformed matrix to be diagonal. That 
means that f [ 2 = 0 and f 21 = 0. Write out the first column of the product on the right. 

fn fi2\[S n S V2 \ ffn fu\ ( S n \ 

/21 $22 ) V^21 S 22 ) V /21 /22 / \S 2 l) 

This equals the first column on the left of the same equation 

(S u 
hl \S 2 i 

This is the eigenvector equation that you’ve supposedly already solved. The first column of the component matrix of 
the similarity transformation is simply the set of components of the first eigenvector. When you write out the second 
column of Eq. (7.57) you'll see that it's the defining equation for the second eigenvector. You already know these, so 
you can immediately write down the matrix for the similarity transformation. 

For the example Eq. (7.52) the eigenvectors are given in Eq. (7.54). In components these are 

el— 1(_ 1 1 ), and e ^ | V implying S = 

The inverse to this matrix is 


You should verify that S l MS is diagonal 


S - 1 = T 


1 -1 

1 1 
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7.11 Summation Convention 

In all the manipulation of components of vectors and components of operators you have to do a lot of sums. There are 
so many sums over indices that a convention* was invented (by Einstein) to simplify the notation. 

A repeated index in a term is summed. 

Eq. (7.6) becomes /(e)) = f ki e k . 

Eq. (7.8) becomes u k = f k iVi . 

Eq. (7.26) becomes h ki = f k j9ji- 
IM = M becomes 5ijMj k = M^ k . 

What if there are three identical indices in the same term? Then you made a mistake; that can't happen. What 
about Eq. (7.49)? That has three indices. Yes, and there I explicitly said that there is no sum. This sort of rare case 
you have to handle as an exception. 

7.12 Can you Diagonalize a Matrix? 

At the beginning of section 7.9 I said that the basis in which the components of an operator form a diagonal matrix 
“almost always exists.” There's a technical sense in which this is precisely true, but that's not what you need to know 
in order to manipulate matrices; the theorem that you need to have is that every matrix is the limit of a sequence of 
diagonalizable matrices. If you encounter a matrix that cannot be diagonalized, then you can approximate it as closely as 
you want by a matrix that can be diagonalized, do your calculations, and finally take a limit. You already did this if you 
did problem 4.11, but in that chapter it didn’t look anything like a problem involving matrices, much less diagonalization 
of matrices. Yet it is the same. 

Take the matrix 


You can't diagonalize this. If you try the standard procedure, here is what happens: 

(o l)(S) =A (S) then i-a) =° = (1 - A > 2 

The resulting equations you get for A = 1 are 

Ofi + 2v 2 = 0 and 0 = 0 


* There is a modification of this convention that appears in chapter 12, section 12.5 
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This provides only one eigenvector, a multiple of ( ) . You need two for a basis. 


1 
0 

Change this matrix in any convenient way to make the two roots of the characteristic equation different from each 
other. For example, 

1 + e 2' 

0 1 

The eigenvalue equation is now 


= 


(1 + e - A)(l — A) = 0 

and the resulting equations for the eigenvectors are 


A = 1 : ev i + 2v 2 = 0, 0 = 0 A = l + e: 0t>i + 2v 2 = 0, ev 2 = 0 

Now you have two distinct eigenvectors, 

A= 1: (_ f 1 / 2 ), and A*l + £ : (J) 

You see what happens to these vectors as e — > 0. 

Differential Equations at Critical 

Problem 4.11 was to solve the damped harmonic oscillator for the critical case that b 2 — Akm = 0. 


d?x , , dx 

m-j-r = -kx - b-j- 
dt 2 dt 

Write this as a pair of equations, using the velocity as an independent variable. 

dx . dv x k b 


dt ~ Vx 


and 


dv x 

-nr = x v x 

dt m m 


In matrix form, this is a matrix differential equation. 


d ( x 
dt l v x 


-k/m —b/m I \v : 


x 


(7.59) 
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This is a linear, constant-coefficient differential equation, but now the constant coefficients are matrices. Don’t let that 
slow you down. The reason that an exponential form of solution works is that the derivative of an exponential is an 
exponential. Assume such a solution here. 


x 

v x 


_ at 


giving 


a 


e at = 


— k/m —bjm 


~at 


(7.60) 


When you divide the equation by e at , you're left with an eigenvector equation where the eigenvalue is a. As usual, to 
get a non-zero solution set the determinant of the coefficients to zero and the characteristic equation is 


det 


0 — a 
—k/m 


—bjm — a 


a(a + b/m) + k/m = 0 


with familiar roots 

a = ( — b ± \Jb 2 — 4km ) / 2 m 

If the two roots are equal you may not have distinct eigenvectors, and in this case you do not. No matter, you can solve 
any such problem for the case that b 2 — 4 km A 0 and then take the limit as this approaches zero. 

The eigenvectors come from the either one of the two equations represented by Eq. (7.60). Pick the simpler one, 

is then A 

Pick the initial conditions that x(0) = 0 and ^(0) = Vo- You must choose some initial conditions in order to apply this 
technique. In matrix terminology this is 


aA = B. The column matrix 



These are two equations for the two unknowns 

A + + A_ = 0, ol + A + + a-A_ = t> 0 , so A + = — — — , 7l_ = — A + 

cr+ — oi - ' 
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1 ) <i) = 

Vx ) Ot + - ot- 


1 

a + 


0 a+t 



If you now take the limit as b 2 — > Akm, or equivalently as cr_ — > a +1 this expression is just the definition of a derivative. 



f te at \ 
V ° V (1 + at)e at ) 


b 

a = — - — 
2m 


(7.61) 


7.13 Eigenvalues and Google 

The motivating idea behind the search engine Google is that you want the first items returned by a search to be the 
most important items. How do you do this? How do you program a computer to decide which web sites are the most 
important? 

A simple idea is to count the number of sites that contain a link to a given site, and the site that is linked to the 

most is then the most important site. This has the drawback that all links are treated as equal. If your site is referenced 

from the home page of Al Einstein, it counts no more than if it’s referenced by Joe Blow. This shouldn’t be. 

A better idea is to assign each web page a numerical importance rating. If your site, #1, is linked from sites 

#11, #59, and #182, then your rating, x\, is determined by adding those ratings (and multiplying by a suitable scaling 

constant). 

Xl = C(x ii +X59 +X182) 

Similarly the second site's rating is determined by what links to it, as 


%2 — C(^137 + *£157983 + + ^876) 


But this assumes that you already know the ratings of the sites, and that’s what you’re trying to find! 
Write this in matrix language. Each site is an element in a huge column matrix •{##. 


N 


Xi = 


C E 

3 = 1 


a^jXj 


or 



/ ° 

1 

0 

V... 


0 10 1 
0 0 0 0 
10 11 




An entry of 1 indicates a link and a 0 is no link. This is an eigenvector problem with the eigenvalue A = 1 /C, and though 
there are many eigenvectors, there is a constraint that lets you pick the right one. All the x^s must be non-negative, 
and there's a theorem (Perron-Frobenius) guaranteeing that you can find such an eigenvector. This algorithm is a key 
idea behind Google's ranking methods. They have gone well beyond this basic technique of course, but the spirit of the 
method remains. 

See www-db.stanford.edu/~backrub/google.html for more on this. 
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7.14 Special Operators 

Symmetric 

Antisymmetric 

Hermitian 

Antihermitian 

Orthogonal 

Unitary 

Idempotent 

Nilpotent 

Self-adjoint 

In no particular order of importance, these are names for special classes of operators. It is often the case that an 
operator defined in terms of a physical problem will be in some way special, and it’s then worth knowing the consequent 
simplifications. The first ones involve a scalar product. 

Symmetric: (u,S(v)) = (S{u),v) 

Antisymmetric: (u,A(v)) = ~(A(u),v) 

The inertia operator of Eq. (7.3) is symmetric. 


I(co) = j dmrx(pxr ) satisfies (uJi, I (£ 2 )) 


£1 -I{U2) = (/( ui),u 2 ) = /(cui) -uj 2 


Proof: Plug in. 

cui • I (C02) = ■ J dm r x (cJ 2 x r ) = cJj ■ j dm [cu2 r 2 — r(uj2 ■ f )] 

= J dm [tJi -uj 2 r 2 - (tJi -f)(uj2 " 0 ] = ^(^1) -^2 

What good does this do? You will be guaranteed that all eigenvalues are real, all eigenvectors are orthogonal, and the 
eigenvectors form an orthogonal basis. In this example, the eigenvalues are moments of inertia about the axes defined 
by the eigenvectors, so these moments better be real. The magnetic field operator (problem 7.28) is antisymmetric. 

Hermitian operators obey the same identity as symmetric: (u,H(v)') = (//(«),??). The difference is that in 
this case you allow the scalars to be complex numbers. That means that the scalar product has a complex conjugation 
implied in the first factor. You saw this sort of operator in the chapter on Fourier series, section 5.3, but it didn't appear 
under this name. You will become familiar with this class of operators when you hit quantum mechanics. Then they 
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are ubiquitous. The same theorem as for symmetric operators applies here, that the eigenvalues are real and that the 
eigenvectors are orthogonal. 

Orthogonal operators satisfy (0{u ), 0(v )) = (u, V s ). The most familiar example is rotation. When you rotate 
two vectors, their magnitudes and the angle between them do not change. That's all that this equation says — scalar 
products are preserved by the transformation. 

Unitary operators are the complex analog of orthogonal ones: (U(u),U(v)') = (u, V s ), but all the scalars are 
complex and the scalar product is modified accordingly. 

The next couple you don’t see as often. Idempotent means that if you take the square of the operator, it equals 
the original operator. 

Nilpotent means that if you take successive powers of the operator you eventually reach the zero operator. 
Self-adjoint in a finite dimensional vector space is exactly the same thing as Hermitian. In infinite dimensions it is 
not, and in quantum mechanics the important operators are the self-adjoint ones. The issues involved are a bit technical. 
As an aside, in infinite dimensions you need one extra hypothesis for unitary and orthogonal: that they are invertible. 



7 — Operators and Matrices 


227 


Problems 


7.1 Draw a picture of the effect of these linear transformations on the unit square with vertices at (0, 0), (1, 0), (1, 1), 
(0, 1). The matrices representing the operators are 


(a) 






2 

2 


Is the orientation preserved or not in each case? See the figure at the end of section 7.7 


7.2 Using the same matrices as the preceding question, what is the picture resulting from doing (a) followed by (c)? 
What is the picture resulting from doing (c) followed by (a)? The results of section 7.4 may prove helpful. 


7.3 Look again at the parallelogram that is the image of the unit square in the calculation 
of the determinant. In Eq. (7.39) I used the cross product to get its area, but sometimes 

a brute-force method is more persuasive. If the transformation has components 

The corners of the parallelogram that is the image of the unit square are at (0,0), (a, c), 
(a + b,c + d), ( b,d ). You can compute its area as sums and differences of rectangles and 
triangles. Do so; it should give the same result as the method that used a cross product. 




7.4 In three dimensions, there is an analogy to the geometric interpretation of the cross product as the area of a 
parallelogram. The triple scalar product A ■ B x C is the volume of the parallelepiped having these three vectors as 
edges. Prove both of these statements starting from the geometric definitions of the two products. That is, from the 
AB cos 6 and ABs'mO definitions of the dot product and the magnitude of the cross product (and its direction). 


7.5 Derive the relation v = uj x r for a point mass rotating about an axis. Refer to the figure before Eq. (7.2). 


7.6 You have a mass attached to four springs in a plane and that are in turn attached to four walls as on page 191; the 
mass is at equilibrium. Two opposing spring have spring constant k\ and the other two are & 2 - Push on the mass with a 
(small) force F and the resulting displacement of m is d = f(F), defining a linear operator. Compute the components 
of / in an obvious basis and check a couple of special cases to see if the displacement is in a plausible direction, especially 
if the two k's are quite different. 
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7.7 On the vector space of quadratic polynomials, degree < 2, the operator d/dx is defined: the derivative of such 
a polynomial is a polynomial, (a) Use the basis eo = 1, e\ = x, and e *2 = x 2 and compute the components of this 
operator, (b) Compute the components of the operator d 2 /dx 2 . (c) Compute the square of the first matrix and compare 
it to the result for (b). Ans: (a ) 2 = (b) 

7.8 Repeat the preceding problem, but look at the case of cubic polynomials, a four-dimensional space. 

7.9 In the preceding problem the basis 1, x, x 2 , x 3 is too obvious. Take another basis, the Legendre polynomials: 

Po(x) = 1, Pi(x)=x, P 2 (x) = |r 2 -^ P 3 (x) = ^x 3 - ^x 

and repeat the problem, finding components of the first and second derivative operators. Verify an example explicitly to 
check that your matrix reproduces the effect of differentiation on a polynomial of your choice. Pick one that will let you 
test your results. 

7.10 What is the determinant of the inverse of an operator, explaining why? 

Ans: 1/ det(original operator) 

7.11 Eight identical point masses m are placed at the corners of a cube that has one corner at the origin of the 
coordinates and has its sides along the axes. The side of the cube is length = a. In the basis that is placed along the 
axes as usual, compute the components of the inertia tensor. Ans: In = 8 ma 2 

7.12 For the dumbbell rotating about the off-axis axis in Eq. (7.19), what is the time-derivative of LI In very short 
time dt, what new direction does L take and what then is dL ? That will tell you dL/dt. Prove that this is c u x L. 

7.13 A cube of uniform volume mass density, mass m, and side a has one corner at the origin of the coordinate system 
and the adjacent edges are placed along the coordinate axes. Compute the components of the tensor of inertia. Do it 
(a) directly and (b) by using the parallel axis theorem to check your result. 

/ 2/3 -1/4 — 1/4 \ 

Ans: ma 2 -1/4 2/3 -1/4 

V —1/4 -1/4 2/3/ 

7.14 Compute the cube of Eq. (7.13) to find the trigonometric identities for the cosine and sine of triple angles in terms 
of single angle sines and cosines. Compare the results of problem 3.9. 
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7.15 On the vectors of column matrices, the operators are matrices. For the two dimensional case take M = 

1 


a b 
c d 


and find its components in the basis 


and 


1 

-1 


What is the determinant of the resulting matrix? Ans: Mu = (a + b + c + d)/ 2, and the determinant is still ad — be. 


7.16 Show that the tensor of inertia, Eq. (7.3), satisfies Cj\ ■ I (£ 2 ) = /(cJi) -cu 2 . What does this identity tell you about 
the components of the operator when you use the ordinary orthonormal basis? First determine in such a basis what 
ei -/(e 2 ) is. 


7.17 Use the definition of the center of mass to show that the two cross terms in Eq. (7.21) are zero. 


7.18 Prove the Perpendicular Axis Theorem. This says that for a mass that lies flat in a plane, the moment of inertia 
about an axis perpendicular to the plane equals the sum of the two moments of inertia about the two perpendicular axes 
that lie in the plane and that intersect the third axis. 

7.19 Verify in the conventional, non-matrix way that Eq. (7.61) really does provide a solution to the original second 
order differential equation (7.59). 


7.20 The Pauli spin matrices are 


— 






0 

-1 


Show that a x a y = ia z and the same for cyclic permutations of the indices x, y, z. Compare the products a x a y and 
<J y o x and the other pairings of these matrices. 

7.21 Interpret o ■ A as <J X A X + o y A y + a Z A Z and prove that 

a ■ A 3 ■ B = A - B + id • A x B 


where the first term on the right has to include the identity matrix for this to make sense. 


7.22 Evaluate the matrix 


I — a ■ A 


= (I - a- A) 


-1 
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Evaluate this by two methods: (a) You may assume that A is in some sense small enough for you to manipulate by infinite 
series methods. This then becomes a geometric series that you can sum. Use the results of the preceding problem. 

(b) You can manipulate the algebra directly without series. I suggest that you recall the sort of manipulation that allows 
you to write the complex number 1/(1 — i) without any i's in the denominator. 

I suppose you could do it a third way, writing out the 2x2 matrix and explicitly inverting it, but I definitely don't 
recommend this. 

7.23 Evaluate the sum of the infinite series defined by e~ ia y s . Where have you seen this result before? The first term 
in the series must be interpreted as the identity matrix. Ans: / cos 6 — ia y sin 6 

7.24 For the moment of inertia about an axis, the integral is f r 2 ^_dm. State precisely what this m function must be 
for this to make sense as a Riemann-Stieltjes integral, Eq. (1.28). For the case that you have eight masses, all mo at 
the 8 corners of a cube of side a, write explicitly what this function is and evaluate the moment of inertia about an axis 
along one edge of the cube. 

7.25 The summation convention allows you to write some compact formulas. Evaluate these, assuming that you're 
dealing with three dimensions. Note Eq. (7.30). Define the alternating symbol to be 

1: It is totally anti-symmetric. That is, interchange any two indices and you change the sign of the value. 

2: 6123 = 1. [E.g. 6132 = —1, 6312 = +1] 

6h, ^ij^ijki dmnAmBn, SmnUmVni UnVrij 

^ijk^mnk = ~ ^in^jm 

Multiply the last identity by AjB m C n and interpret. 

7.26 The set of Hermite polynomials starts out as 

H 0 = 1, Hi = 2x, H 2 = 4x 2 - 2, H 3 = 8x 3 - 12x, H A = 16a: 4 - 48a; 2 + 12, 

(a) For the vector space of cubic polynomials in x, choose a basis of Hermite polynomials and compute the matrix of 
components of the differentiation operator, d/dx. 

(b) Compute the components of the operator d? / dx 2 and show the relation between this matrix and the preceding one. 

7.27 On the vector space of functions of x, define the translation operator 


T a f = g means g(x) = f(x-a ) 
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This picks up a function and moves it by a to the right. 

(a) Pick a simple example function / and test this definition graphically to verify that it does what I said. 

(b) On the space of cubic polynomials and using a basis of your choice, find the components of this operator. 

(c) Square the resulting matrix and verify that the result is as it should be. 

(d) What is the inverse of the matrix? (You should be able to guess the answer and then verify it. Or you can work out 
the inverse the traditional way.) 

(e) What if the parameter a is huge ? Interpret some of the components of this first matrix and show why they are clearly 
correct. (If they are.) 

(f) What is the determinant of this operator? 

(g) What are the eigenvectors and eigenvalues of this operator? 

7.28 The force by a magnetic field on a moving charge is F = qv x B. The operation v x B defines a linear operator 
on v, stated as f(v) = V x B. What are the components of this operator expressed in terms of the three components of 
the vector B? What are the eigenvectors and eigenvalues of this operator? For this last part, pick the basis in which you 
want to do the computations. If you’re not careful about this choice, you are asking for a lot of algebra. Ans: eigenvalues: 

0, ±iB 

7.29 In section 7.8 you have an operator M expressed in two different bases. What is its determinant computed in each 
basis? 

7.30 In a given basis, an operator has the values 

A(e i) = e\ + 3e2 and AL(e 2 ) = 2ej + 4e 4 

(a) Draw a picture of what this does, (b) Find the eigenvalues and eigenvectors and determinant of A and see how this 
corresponds to the picture you just drew. 

7.31 The characteristic polynomial of a matrix M is det (M — XI). I is the identity matrix and A is the variable in 
the polynomial. Write the characteristic polynomial for the general 2x2 matrix. Then in place of A in this polynomial, 
put the matrix M itself. The constant term will have to include the factor / as usual. For this 2x2 case verify the 
Cayley-Hamilton Theorem, that the matrix satisfies its own characteristic equation, making this polynomial in M the 
zero matrix. 

7.32 (a) For the magnetic field operator defined in problem 7.28, place z = e . 3 along the direction of B. Then take 

e\ = — 2, e?2 = (x + iy)/y/ 2 and find the components of the linear operator representing the magnetic field. 
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(b) A charged particle is placed in this field and the equations of motion are ma = F = qv x B. Translate this into the 
operator language with a matrix like that of problem 7.28, and write F = ma in this language and this basis. Ans: (part) 
mf'i = — iqBri , where r\ = (x + iy)/V 2. mr '2 = +iqBr 2 , where r ’2 = (x — iy)/V 2. 

7.33 For the operator in problem 7.27 part (b), what are the eigenvectors and eigenvalues? 

7.34 A nilpotent operator was defined in section 7.14. For the operator defined in problem 7.8, show that it is nilpotent. 
How does this translate into the successive powers of its matrix components? 

7.35 A cube of uniform mass density has side a and mass m. Evaluate its moment of inertia about an axis along a 
longest diagonal of the cube. Note: If you find yourself entangled in a calculation having multiple integrals with hopeless 
limits of integration, toss it out and start over. You may even find problem 7.18 useful. Ans: ma 2 / 6 

7.36 Show that the set of all 2 x 2 matrices forms a vector space. Produce a basis for it, and so what is its dimension? 

7.37 In the vector space of the preceding problem, the following transformation defines an operator. f(M ) = SMS 1 . 
For S, use the rotation matrix of Eq. (7.13) and compute the components of this operator /. The obvious choice of 
basis would be matrices with a single non-zero element 1. Instead, try the basis I, o x , a y , cr~. Ans: A rotation by 2a 
about the y- axis, e.g. f(e 1 ) = eicos2ct — e 3 sin 2 a. 

7.38 What are the eigenvectors and eigenvalues of the operator in the preceding problem? Now you’ll be happy I 
suggested the basis that I did. 

7.39 (a) The commutator of two matrices is defined to be [A, B ] = AB — BA. Show that this commutator satisfies 
the Jacobi identity. 

[A,[B,C]] + [B,[C,A}] + [C,[A,B}] = 0 

(b) The anti-commutator of two matrices is { A , B } = AB + BA. Show that there is an identity like the Jacobi identity, 
but with one of the two commutators (the inner one or the outer one) replaced by an anti-commutator. I’ll leave it to 
you to figure out which. 

7.40 Diagonalize each of the Pauli spin matrices of problem 7.20. That is, find their eigenvalues and specify the 
respective eigenvectors as the basis in which they are diagonal. 

7.41 What are the eigenvalues and eigenvectors of the rotation matrix Eq. (7.13)? Translate the answer back into a 
statement about rotating vectors, not just their components. 
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7.42 Same as the preceding problem, but replace the circular trigonometric functions in Eq. (7.13) with hyperbolic ones. 
Also change the sole minus sign in the matrix to a plus sign. Draw pictures of what this matrix does to the basis vectors. 
What is its determinant? 

7.43 Compute the eigenvalues and eigenvectors of the matrix Eq. (7.18). Interpret each. 

7.44 Look again at the vector space of problem 6.36 and use the basis /i, / 2 , /s that you constructed there, (a) In 
this basis, what are the components of the two operators described in that problem? 

(b) What is the product of these two matrices? Do it in the order so that it represents the composition of the first 
rotation followed by the second rotation. 

(c) Find the eigenvectors of this product and from the result show that the combination of the two rotations is a third 
rotation about an axis that you can now specify. Can you anticipate before solving it, what one of the eigenvalues will 
be? 

(d) Does a sketch of this rotation axis agree with what you should get by doing the two original rotations in order? 

7.45 Verify that the Gauss elimination method of Eq. (7.44) agrees with (7.38). 

7.46 What is the determinant of a nilpotent operator? See problem 7.34. 

7.47 (a) Recall (or look up) the method for evaluating a determinant using cofactors (or minors). For 2x2, 3x3, and 
in fact, for NxN arrays, how many multiplication operations are required for this. Ignore the time to do any additions 
and assume that a computer can do a product in 10 -10 seconds. How much time does it take by this method to do the 
determinant for 10x10, 20x20, and 30x30 arrays? Express the times in convenient units. 

(b) Repeat this for the Gauss elimination algorithm at Eq. (7.44). How much time for the above three matrices and for 
100x100 and 1000 xlOOO? Count division as taking the same time as multiplication. Ans: For the first method, 30x 30 
requires lOOOOxage of universe. For Gauss it is 3 /is. 

7.48 On the vector space of functions on 0 < x < L, (a) use the basis of complex exponentials, Eq. (5.20), and 
compute the matrix components of d/dx. 

(b) Use the basis of Eq. (5.17) to do the same thing. 

7.49 Repeat the preceding problem, but for d 2 / dx 2 . Compare the result here to the squares of the matrices from that 
problem. 

7.50 Repeat problem 7.27 but using a different vector space of functions with basis 

(a) e nmx / L , ( n = o, ±1, ±2, . . .) 
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(b) cos(mrx/L), and sm(rnnx/L). 

These functions will be a basis in the set of periodic functions of x, and these will be very big matrices. 

7.51 (a) What is the determinant of the translation operator of problem 7.27? 

(b) What is the determinant of d/dx on the vector space of problem 7.26? 

7.52 (a) Write out the construction of the trace in the case of a three dimensional operator, analogous to Eq. (7.47). 
What are the coefficients of e 2 and e 3 ? (b) Back in the two dimensional case, draw a picture of what (/ + ef ) does to 
the basis vectors to first order in e. 

7.53 Evaluate the trace for arbitrary dimension. Use the procedure of Gauss elimination to compute the determinant, 
and note at every step that you are keeping terms only through e° and e 1 . Any higher orders can be dropped as soon as 
they appear. Ans: Eili fii 

7.54 The set of all operators on a given vector space forms a vector space.* (Show this.) Consider whether you can or 
should restrict yourself to real numbers or if you ought to be dealing with complex scalars. 

Now what about the list of operators in section 7.14. Which of them form vector spaces? 

Ans: Yes(real), Yes(real), No, No, No, No, No, No, No 

7.55 In the vector space of cubic polynomials, choose the basis 

e*o = 1, eq =1 + x, e 2 = l + x + x 2 , e 3 = 1 + x + x 2 + x 3 . 

In this basis, compute the matrix of components of the operator P, where this is the parity operator, defined as the 
operator that takes the variable x and changes it to —x. For example P(e i) = 1 — x. Compute the square of the 
resulting matrix. What is the determinant of PI If you had only the quadratic polynomials with basis eo, e\, e* 2 , what 
is the determinant? What about linear polynomials, with basis e*o> e*i? Maybe even constant polynomials? 

7.56 On the space of quadratic polynomials define an operator that permutes the coefficients: f(x) = ax 2 + bx + c, 
then Of = g has g{x) = bx 2 + cx + a. Find the eigenvalues and eigenvectors of this operator. 

7.57 The results in Eq. (7.36) is a rotation about some axis. Where is it? Notice that a rotation about an axis leaves 
the axis itself alone, so this is an eigenvector problem. If it leaves the vector alone, you even know what the eigenvalue 
is, so you can easily find the vector. Repeat for the other rotation, found in Eq. (7.37) Ans: e\ + e *2 — 

7.58 Find the eigenvectors and eigenvalues of the matrices in problem 7.1. 

* If you’re knowledgeable enough to recognize the difficulty caused by the question of domains, you’ll recognize that 
this is false in infinite dimensions. But if you know that much then you don't need to be reading this chapter. 
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The world is not one-dimensional, and calculus doesn’t stop with a single independent variable. The ideas of partial 
derivatives and multiple integrals are not too different from their single-variable counterparts, but some of the details 
about manipulating them are not so obvious. Some are downright tricky. 

8.1 Partial Derivatives 

The basic idea of derivatives and of integrals in two, three, or more dimensions follows the same pattern as for one 
dimension. They're just more complicated. 

The derivative of a function of one variable is defined as 


df{x) r f(x + Ax)-f(x) 

— j — — lim t 

ax Air— >-0 Ax 


( 8 . 1 ) 


You would think that the definition of a derivative of a function of x and y would then be defined as 

df(x,y) = Um f(x + Ax,y) - f(x,y) 
dx Ar-rO Ax 

and more-or-less it is. The d notation instead of d is a reminder that there are other coordinates floating around that 
are temporarily being treated as constants. 

In order to see why I used the phrase “more-or-less," take a very simple example: f(x,y) = y. Use the preceding 
definition, and because y is being held constant, the derivative d f /dx = 0. What could be easier? 

I don’t like these variables so I’ll switch to a different set of coordinates, x' and y'\ 


y' = x + y and x' = x 


What is df /dx' now? 

f(x,y) = y = y' -x = y' - x' 

Now the derivative of / with respect to x' is —1, because I’m keeping the other coordinate fixed. Or is the derivative still 
zero because x' = x and I'm taking df /dx and why should that change just because I’m using a different coordinate 
system? 
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The problem is that the notation is ambiguous. When you see d f /dx it doesn’t tell you what to hold constant. 
Is it to be y or y' or yet something else? In some contexts the answer is clear and you won't have any difficulty deciding, 
but you’ve already encountered cases for which the distinction is crucial. In thermodynamics, when you add heat to a 
gas to raise its temperature does this happen at constant pressure or at constant volume or with some other constraint? 
The specific heat at constant pressure is not the same as the specific heat at constant volume; it is necessarily bigger 
because during an expansion some of the energy has to go into the work of changing the volume. This sort of derivative 
depends on type of process that you’re using, and for a classical ideal gas the difference between the two molar specific 
heats obeys the equation 

Cp Cy = /? 

If the gas isn’t ideal, this equation is replaced by a more complicated and general one, but the same observation applies, 
that the two derivatives d,Q/dT aren’t the same. 

In thermodynamics there are so many variables in use that there is a standard notation for a partial derivative, 
indicating exactly which other variables are to be held constant. 


fdU 

\dV 


T 


and 



P 


represent the change in the internal energy of an object per change in volume during processes in which respectively the 
temperature and the pressure are held constant. In the previous example with the function / = y, this says 


d£\ 

dxjy 


and 



= -1 


This notation is a way to specify the direction in the x-y plane along which you're taking the derivative. 


8.2 Chain Rule 

For functions of one variable, the chain rule allows you to differentiate with respect to still another variable: y a function 
of x and x a function of t allows 

dy dy dx 
dt dx dt 

You can derive this simply from the definition of a derivative. 


Ay = y(x(t + At)) -y(x(t)) 

At At 

_y{x(t + At)) - y{x(t)) x(t + At) - x(t) _ Ay Ax 
x(t + At) — x(t) At Ax At 


( 8 . 3 ) 
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Take the limit of this product as At — > 0. Necessarily then you have that Ax 0 too (unless the derivative doesn't 
exist anyway). The second factor is then the definition of the derivative dx/dt, and the first factor is the definition of 
dy/dx. The Leibnitz notation as written in Eq. (8.3) leads you to the required proof. 

What happens with more variables? Roughly the same thing but with more manipulation, the same sort of 
manipulation that you use to derive the rule for differentiating more complicated functions of one variable (as in section 
15) - 

Compute 

Back to the A's. The manipulation is much like the preceding except that you have to add and subtract a term in the 
second line. 

Af = f(x(t + At),y(t + At)) ~ f(x(t),y{t)) 

At At 

^ f{x(t + At),y(t + At)) -f{x(t),y(t + At)) + f{x(t), y[t + At)) - f(x(t), y(t)) 

At 

^ f{x(t + At),y(t + At)) - f(x(t),y(t + At)) _ x{t + At) - x(t) 
x(t + At) — x(t) At 

f{x(t),y(t + At)) - f(x{t),y{t)) y(t + At) - y(t) 
y{t + At) ~ y(t) At 

_ Af Ax Af Ay 
Ax At + Ay At 


In the first factor of the first term, Af / Ax, the variable x is changed but y is not. In the first factor of the second 
term, the reverse holds true. The limit of this expression is then 


A f_ = <jf_ = ( 9_l\ + f d_l\ dy 

At— At dt \dx ) y dt \dy ) x dt 


(8.4) 


If these manipulations look familiar, it's probably because they mimic the procedures of section 1.5. That case is like 
this one, with the special values x = y = t. 

Example: (When you want to check out an equation, you should construct an example so that it reveals a lot of 
structure without requiring a lot of calculation.) 


f{x,y) = Axy 2 , 


and x(t) = Ct 3 , y(t) = Dt 2 
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First do it using the chain rule. 


df _ { df\ dx (df\ dy 

dt \ dx J y dt "I" \ dy ) x dt 

= (Ay 2 ) (3 Ct 2 ) + (2 Axy) (2 Dt) 

= (A(Dt 2 ) 2 ) (3Ct 2 ) + ( 2A{Ct 3 )(Dt 2 ))(2Dt ) 

= lACD 2 t 6 

Now repeat the calculation by first substituting the values of x and y and then differentiating. 

| = ± tl A(Ct W ] 

= S [ ACDV I 

= 7 ACD 2 t e 


What if / also has an explicit t in it: / (t, x(t),y(t))7 That simply adds another term. Remember, dt/dt = 1. 


df f df\ f df\ dx f df\ dy 
dt ~ \ dt) xy + \dx) yt dt + \dy) xt dt 


(8.5) 


Sometimes you see the chain rule written in a slightly different form. You can change coordinates from (x,y) to 
(r, 0), switching from rectangular to polar. You can switch from (x, y) to a system such as (xf y') = (x + y,x — y). The 
function can be expressed in the new coordinates explicitly. Solve for x, y in terms of r, (j) or xf y' and then differentiate 
with respect to the new coordinate. OR you can use the chain rule to differentiate with respect to the new variable. 


m - m m + m m 

\dxfjy, \dxjy \dx' J y , \ dy) x \dx' J 


This is actually not a different equation from Eq. (8.4). It only looks different because in addition to t there's another 
variable that you have to keep constant: t — > x' , and y' is constant. 
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Example: When you switch from rectangular to plane polar coordinates what is df /d(j) in terms of the x and y 
derivatives? 


x = r cos < 


y = r sin 0, 


so 


1 ) - (¥) (m) + (¥) (¥ 

d(j)J r \dxj y \d(pj r \dyj x \d(j) /r 


= ( d l\ ( . rain<t>)+ (Sf 


dx J , 


dy 


( r cos < 


If f(x,y) = x 2 + y 2 this better be zero, because I’m finding how / changes when r is held fixed. Check it out; it is. 
The equation (8.6) presents the form that is most important in many applications. 

Example: What is the derivative of y with respect to 0 at constant xl 

'dy\ fdy\ ( dr\ f dy\ ( 

dr)^\d(j))„ \d<f> ) r \< 90 / 

1 


= [sin0] 


+ [r cos 0] ■ 1 = r 



COS 0 


(8.7) 


You see a graphical interpretation of the calculation in this diagram: 0 changes by A 0, so the coordinate moves 
up by Ay ( x is constant). The angle between the lines Ay and rA0 is 0 itself. This means that At/-^rA0 = 1/ cos 0, 
and that is precisely the preceding equation for ( dy/d(/)) x . 

In doing the calculation leading to Eq. (8.7), do you see how to do the calculation for (dr / dcj)) ! Differentiate 
the equation x = rcos0 with respect to 0. 

, f d cos 0\ / dr 

Solve for the unknown derivative and you have the result. 

Another example: f(x,y ) = x 2 — 2 xy. The transformation between rectangular and polar coordinates is x = 

r cos0, y = rsin0. What is (df /dx) 1 


x = r cos ( 


f dx\ _ / dr 

\d(p) x ~ ~\d(p 


COS( 


r sm < 


m _(dr\ 

dx 


dx 


dx ) \dx y r 


+ 


df\ ( dy 


dy J x \dxy r 


= (2x - 2 y) + (—2x) 


dy 

dx 
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dy\ ( dy/d(j)) r rcos0 


= — cot ( 


^dx J r ( dx/d(p) r -rsincj) 

(Remember problem 1.49?) Put these together and 

= ( 2x — 2 y) + (— 2x)(— cot0) = 2x — 2y + 2a; cot < 
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( 8 . 8 ) 

(8.9) 


The brute-force way to do this is to express the function / explicitly in terms of the variables x and r, eliminating y and 


y = r sin 0 = \/ r 2 — x 2 , 

df\ d 


then 


dx 


dx 


x 


2 — 2 x\Zr 2 — x 2 


= 2x - 2 \Jr 2 ~ x 2 - 2x 


Vr 


x z 


-(-x) = 2x + 


—2 (r 2 — x 2 ) + 2a; 2 


yjr 2 — 


x- 


You can see that this is the same as the equation (8.9) if you look at the next-to-last form of equation (8.10). 


x 


r cos0 


Vr 


x z 


V~ r 


] COS 2 (j) 


= cot ( 


Is this result reasonable? Look at what happens to y when you change a; by a little bit. Constant 
r is a circle, and if (j) puts the position over near the right side (ten or twenty degrees), a little change 
in x causes a big change in y as shown by the rectangle. As drawn, Ay/ Ax is big and negative, sort 
of like the (negative) cotangent of 0 as in Eq. (8.8). 


8.3 Differentials 

For a function of a single variable you can write 


... df 
dt = ~i~dx 
dx 


( 8 . 10 ) 



( 8 . 11 ) 


and read (sort of) that the infinitesimal change in the function / is the slope times the infinitesimal change in x. Does 
this really make any sense? What is an infinitesimal change? Is it zero? Is dx a number or isn’t it? What's going on? 

It is possible to translate this intuitive idea into something fairly simple and that makes perfectly good sense. 
Once you understand what it really means you’ll be able to use the intuitive idea and its notation with more security. 
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Let g be a function of two variables, x and h. 


df(x) 1 

g(x , h) = - ' h has the property that — [ f(x + h) 






That is, the function g{x 1 h) approximates very well the change in / as you go from x to x + h. The difference between 
g and A / = f{x + h) — f(x) goes to zero so fast that even after you’ve divided by h the difference goes to zero. 

The usual notation is to use the symbol dx instead of h and to call the function df instead* of g. 


df(x, dx) = f'(x) dx has the property that 

1 , , ( 8 . 12 ) 
— | f(x + dx) — f(x) — df(x , dx) | — > 0 as dx — > 0 

In this language dx is just another variable that can go from — oo to +oo and df is just a specified function of two 
variables. The point is that this function is useful because when the variable dx is small df provides a very good 
approximation to the increment A / in /. 

What is the volume of the peel on an orange? The volume of a sphere is V = 47rr 3 /3, so its differential is 
dV = 47tr 2 dr. If the radius of the orange is 3 cm and the thickness of the peel is 2 mm, the volume of the peel is 

dV = Aitr 2 dr = 47r(3cm) 2 (0.2cm) = 23 cm 3 

The whole volume of the orange is |7t(3cm) 3 = 113cm 3 , so this peel is about 20% of the volume. 

Differentials in Several Variables 

The analog of Eq. (8.11) for several variables is 

dx + dy (8.13) 

Roughly speaking, near a point in the x-y plane, the value of the function / changes as a linear function of the coordinates 
as you move a (little) distance away. This function df describes this change to high accuracy. It bears the same relation 
to Eq. (8.4) that (8.11) bears to Eq. (8.3). 

* Who says that a variable in algebra must be a single letter? You would never write a computer program that way. 
d Fred 2 / d Fred = 2 Fred is perfectly sensible. 


df = df(x,y,dx,dy)=(^F)' 
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For example, take the function f(x, y) = x 2 + y 2 . At the point (x, y) = (1, 2), the differential is 


df( 1, 2, dx, dy) 


(2x) 


dx + (2 y) 

(1,2) 


dy = 2 dx + 4 dy 

( 1 , 2 ) 


so that 

/( 1.01, 1.99) » /(l, 2) + df( 1, 2, .01, -.01) = l 2 + 2 2 + 2(.01) + 4(-.01) = 4.98 

compared to the exact answer, 4.9802. 

The equation analogous to (8.12) is 

df(x, y, dx, dy) has the property that 

-^-\f(x + dx, y + dy) — f(x, y) — df(x, y, dx, dy)\ — >0 as dr — > 0 (8.14) 

where dr = sj dx 2 + dy 2 is the distance to (. x,y ). It's not that you will be able to do a lot more with this precise 
definition than you could with the intuitive idea. You will however be able to work with a better understanding of you're 
actions. When you say that “dx is an infinitesimal” you can understand that this means simply that dx is any number 
but that the equations using it are useful only for very small values of that number. 

You can't use this notation for everything as the notation for the derivative demonstrates. The symbol “df / dx" 
does not mean to divide a function by a length; it refers to a well-defined limiting process. This notation is however 
constructed so that it provides an intuitive guide, and even if you do think of it as the function df divided by the variable 
dx, you get the right answer. 

Why should such a thing as a differential exist? It's essentially the first terms after the constant in the power series 
representation of the original function: section 2.5. But how to tell if such a series works anyway? I've been notably 
cavalier about proofs. The answer is that there is a proper theorem guaranteeing Eq. (8.14) works. It is that if both 
partial derivatives exist in the neighborhood of the expansion point and if these derivatives are continuous there, then 
the differential exists and has the value that I stated in Eq. (8.13). It has the properties stated in Eq. (8.14). For all this 
refer to one of many advanced calculus texts, such as Apostol's.* 


* Mathematical Analysis, Addison-Wesley 
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8.4 Geometric Interpretation 

For one variable, the picture of the differential is simple. Start with a graph of the function and at a point (x,y) = 
(x,f(x)), find the straight line that best approximates the function in the immediate neighborhood of that point. Now 
set up a new coordinate system with origin at this (x,y) and call the new coordinates dx and dy. In this coordinate 
system the straight line passes through the origin and the slope is the derivative df{x)/dx. The equation for the straight 
line is then Eq. (8.11), describing the differential. 


dy 


df{x ) 
dx 


dx 




For two variables, the picture parallels this one. At a point ( x,y,z ) = (x, y, f(x, y)) find the plane that best 
approximates the function in the immediate neighborhood of that point. Set up a new coordinate system with origin at 
this ( x,y,z ) and call the new coordinates dx, dy, and dz. The equation for a plane that passes through this origin is 
ol dx + /3 dy + 7 dz = 0, and for this best approximating plane, the equation is nothing more than the equation for the 
differential, Eq. (8.13). 


dz 



The picture is a bit harder to draw, but with a little practice you can do it. 

For the case of three independent variables, I'll leave the sketch to you. 

Examples 

The temperature on the surface of a heated disk is given to be T(r, 0) = To + T\ ( 1 — r 2 / a 2 ) , where a is the radius of 
the disk and To and T\ are constants. If you start at position x = c < a, y = 0 and move parallel to the y-axis at speed 
Vo what is the rate of change of temperature that you feel? 
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Use Eq. (8.4), and the relation r = x 2 + y 2 . 


dT 

dt 



§ 

\d(p J r dt 
V 

, Vp 

\A 2 + y 2 


f dT\ ( dr \ dx ( dr \ dy 
V dr) <f> \dx) y dt + \dy) x dt 

2T Vc 2 + v 2 t 2 _ vlt 

a 2 \Jc 2 4- t^t 2 



As a check, the dimensions are correct (are they?). At time zero, this vanishes, and that's what you should expect 
because at the beginning of the motion you're starting to move in the direction perpendicular to the direction in which 
the temperature is changing. The farther you go, the more nearly parallel to the direction of the radius you’re moving. 
If you are moving exactly parallel to the radius, this time-derivative is easier to calculate; it's then almost a problem in 
a single variable. 


dT ~ dT dr 
dt dr dt 


-2Ti- 


-2 u o 


Vot 
a 2 


-2Ti— ^v 0 

1 * 


So the approximate and the exact calculation agree. In fact they agree so well that you should try to find out if this is a 
lucky coincidence or if there some special aspect of the problem that you might have seen from the beginning and that 
would have made the whole thing much simpler. 


8.5 Gradient 

The equation (8.13) for the differential has another geometric interpretation. For a function such as f(x, y) = x 2 +4 y 2 , 
the equations representing constant values of / describe curves in the x-y plane. In this example, they are ellipses. If 
you start from any fixed point in the plane and start to move away from it, the rate at which the value of / changes will 
depend on the direction in which you move. If you move along the curve defined by / = constant then / won't change 
at all. If you move perpendicular to that direction then / may change a lot. 

The gradient of / at a point is the vector pointing in the direction in which 
/ is increasing most rapidly, and the component of the gradient along that 
direction is the derivative of / with respect to the distance in that direction. 
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To relate this to the partial derivatives that we've been using, and to understand how to compute and to use the 
gradient, return to Eq. (8.13) and write it in vector form. Use the common notation for the basis: x and y. Then let 


df = dxx + dyy 


and 


G 


J a S\ i + ( s J- 


dx J , 


dy 


(8.15) 


The equation for the differential is now 


df = df(x,y,dx,dy) 


G-df 


(8.16) 




Because you know the properties of the dot product, you know that this is G dr cos 6 and it is largest when the 
directions of df and of G are the same. It's zero when they are perpendicular. You also know that df is zero when df is 
in the direction along the curve where / is constant. The vector G is therefore perpendicular to this curve. It is in the 
direction in which / is changing most rapidly. Also because df = G dr cos 0, you see that G is the derivative of / with 
respect to distance along that direction. G is the gradient. 

For the example f(x, y) = x 2 + 4 y 2 , G = 2xx + 8 yy. At each point in the x-y plane it provides a vector showing 
the steepness of / at that point and the direction in which / is changing most rapidly. 



Notice that the gradient vectors are twice as long where the ellipses are closest together as they are at the ends 
where the ellipses are farthest apart. The function changes more rapidly in the y-direction. 
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The U.S. Coast and Geodetic Survey makes a large number of maps, and hikers are particularly interested in the 
contour maps. They show curves indicating the lines of constant altitude. When Apollo 16 went to the Moon in 1972, 
NASA prepared a similar map for the astronauts, and this is a small segment of that map. The contour lines represent 
10 meter increments in altitude.* 



The gravitational potential energy of a mass m near the Earth’s (or Moon’s) surface is mgh. This divided by m 
is the gravitational potential, gh. These lines of constant altitude are then lines of constant potential, equipotentials of 
the gravitational field. Walk along an equipotential and you are doing no work against gravity, just walking on the level. 

8.6 Electrostatics 

The electric field can be described in terms of a gradient. For a single point charge at the origin the electric field is 

E(x,y,z) = 


where f is the unit vector pointing away from the origin and r is the distance to the origin. This vector can be written 
as a gradient. Because this E is everywhere pointing away from the origin, it's everywhere perpendicular to the sphere 
centered at the origin. 


E 


-grad 


kq 

r 


You can verify this a several ways. The first is to go straight to the definition of a gradient. (There's a blizzard of 
minus signs in this approach, so have a little patience. It will get better.) This function is increasing most rapidly in 
the direction moving toward the origin. (1 /r) The derivative with respect to distance in this direction is —d/dr, so 


* history.nasa.gov/alsj/al6/ scan by Robin Wheeler 
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—d/dr(l/r) = +l/r 2 . The direction of greatest increase is along — f, so grad (l/r) = — f(l/r 2 ). 
the electric field has another —1 in it, so 


,kq „kq 
-grad— = +r -2 


But the relation to 


There's got to be a better way. 

Yes, instead of insisting that you move in the direction in which the function is increasing most rapidly, simply 
move in the direction in which it is changing most rapidly. The derivative with respect to distance in that direction is 
the component in that direction and the plus or minus signs take care of themselves. The derivative with respect to r 
of (l/r) is —l/r 2 . That is the component in the direction f, the direction in which you took the derivative. This says 
grad (l/r) = — f(l/r 2 ). You get the same result as before but without so much fussing. This also makes it look more 
like the familiar ordinary derivative in one dimension. 

Still another way is from the Stallone-Schwarzenegger brute force school of computing. Put everything in rectan- 
gular coordinates and do the partial derivatives using Eqs. (8.15) and (8.6). 


(M = (S0M) (*) 

V dx ) y,z \ dr Je^\ dx Jy,z r dx 

Repeat this for y and z with similar results and assemble the output. 


= 


1 x 

f 2 y/x 2 +y 2 + z 2 


-grad 


kq 

r 


kq xx + yy + zz 
f 2 i Jx 2 + y 2 + z 2 


kq f 



The symbol V is commonly used for the gradient operator. This vector operator will appear in several other places, 
the curl of a vector field will be the one you see most often. 


„ d „ d „ d 
V = Xt^- +y-rs- +Ztt- 
ox oy oz 


(8.17) 


From Eq. (8.15) you have 


grad / = V/ 


(8.18) 
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8.7 Plane Polar Coordinates 

When doing integrals in the plane there are many coordinate systems to choose from, but rectangular and polar coordinates 
are the most common. You can find the element of area with a simple sketch: The lines (or curves) of constant coordinate 
enclose an area that is, for small enough increments in the coordinates, a rectangle. Then you just multiply the sides. In 
one case Ax ■ Ay and in the other case Ar ■ rA0. 




Vibrating Drumhead 

A circular drumhead can vibrate in many complicated ways. The simplest and lowest frequency mode is approximately 



z(r, (f), t) = zq{ 1 — r 2 / R 2 ) cosut 


(8.19) 


where R is the radius of the drum and c o is the frequency of oscillation. (The shape is more accurately described by 
Eq. (4.22) but this approximation is pretty good for a start.) The kinetic energy density of the moving drumhead is 
u = \o(dz / dty . That is, in a small area A A, the kinetic energy is A K = uAA and the limit as A A — > 0 of AK/AA 
is the area-energy-density. In the same way, a is the area mass density, dm/dA. 

What is the total kinetic energy because of this oscillation? It is J udA = f ud 2 r. To evaluate it, use polar 
coordinates and integrate over the area of the drumhead. The notation d 2 r is another notation for dA just as d 3 r is 
used for a piece of volume. 
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f udA= I rdr f d(j) ^-Zq{( 1 — r 2 / R 2 )u smut)' 
J Jo Jo 2 


= ^-2tt z^u 2 sin 2 ut I dr r(l — r 2 / R 2 )' 

2 Jo 


rR 


„2 / d 2\2 


1 


rr=R 


= an ZgUJ 2 sin 2 ut- I d(r 2 ) (l - r 2 / R 2 ) 2 

2 J r = 0 

= ct7TZqUJ 2 sin 2 ut^-R 2 ^-(l — r 2 / R 2 ) 3 (—1) 

£ O 


1 


( 8 . 20 ) 


= JaR 2 nZnoj 2 sin 2 ut 


6 


o L 


See problem 8.10 and following for more on this.* 


8.8 Cylindrical, Spherical Coordinates 

The three common coordinate systems used in three dimensions are rectangular, cylindrical, and spherical coordinates, 
and these are the ones you have to master. When you need to use prolate spheroidal coordinates you can look them up. 





0<r<oo 0<r<oo 

O<0<27T O<0<7T 

— OO < Z < OO O<0<27T 

The surfaces that have constant values of these coordinates are planes in rectangular coordinates; planes and 
cylinders in cylindrical; planes, spheres, and cones in spherical. In every one of these cases the constant-coordinate 


— OO < X < OO 

— oo < y < oo 
— OO < z < oo 


* For some animations showing these oscillations and others, check out 
www.physics.miami.edu/nearing/mathmethods/drumhead-animations.html 
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surfaces intersect each other at right angles, hence the name “orthogonal coordinate” systems. In spherical coordinates 
I used the coordinate 9 as the angle from the z-axis and p as the angle around the axis. In mathematics books these 
are typically reversed, so watch out for the notation. On the globe of the Earth, p is like the longitude and 9 like the 
latitude except that longitude goes 0 to 180° East and 0 to 180° West from the Greenwich meridian instead of zero to 
27 r. Latitude is 0 to 90° North or South from the equator instead of zero to 7t from the pole. Except for the North-South 
terminology, latitude is 90° — 9. 

The volume elements for these systems come straight from the drawings, just as the area elements do in plane 
coordinates. In every case you can draw six surfaces, bounded by constant coordinates, and surrounding a small box. 
Because these are orthogonal coordinates you can compute the volume of the box easily as the product of its three edges. 

In the spherical case, one side is A r. Another side is rA9. The third side is not rAp\ it is rsin0A0. The reason 
for the factor sin# is that the arc of the circle made at constant r and constant 9 is not in a plane passing through the 
origin. It is in a plane parallel to the x-y plane, so it has a radius rsin0. 


0 

/ 

rectangular 

volume d 3 r = dxdydz 

area d 2 r = dxdy 



cylindrical 

r dr d(j) dz 
rdpdz or rdpdr 



spherical 

r 2 sin 9 dr d9 dtp 
r 2 sin 9 d9 dp 


Examples of Multiple Integrals 

Even in rectangular coordinates integration can be tricky. That’s because you have to pay attention to the limits of 
integration far more closely than you do for simple one dimensional integrals. I'll illustrate this with two dimensional 
rectangular coordinates first, and will choose a problem that is easy but still shows what you have to look for. 


An Area 

Find the area in the x-y plane between the curves y = x 2 / a and y = x. 



(B) 




dx 1 
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y 



In the first instance I fix x and add the pieces of dy in the strip indicated. The lower limit of the dy integral 
comes from the specified equation of the lower curve. The upper limit is the value of y for the given x at the upper 
curve. After that the limits on the sum over dx comes from the intersection of the two curves: y = x = x 2 / a gives 
x = a for that limit. 

In the second instance I fix y and sum over dx first. The left limit is easy, x = y, and the upper limit comes from 
solving y = x 2 / a for x in terms of y. When that integral is done, the remaining dy integral starts at zero and goes up 
to the intersection at y = x = a. 

Now do the integrals. 



r 9/n a 2 a 3 

(A) / dx\x — x a\ = 

Jo 1 J 2 3a 

r a / < 7 3 / 2 

(B) ^ dy[^dy-y\ = a 1/2 ^- 


6 

i 2 a 2 
~2 ~ "6 


If you would care to try starting this calculation from the beginning, without drawing any pictures, be my guest. 


b 



A Moment of Inertia 

The moment of inertia about an axis is f r 2 j_dm. Here, r± is the perpendicular distance to the axis. What is the 
moment of inertia of a uniform sheet of mass M in the shape of a right triangle of sides a and bl Take the moment 
about the right angled vertex. The area mass density, cr = dm/dA is 2 M/ab. The moment of inertia is then 



8 — Multivariable Calculus 


252 


J (x 2 + y 2 )a dA = J dx 


f'b(a—x)/a ra 

dy a(x 2 + y 2 ) = / dxa 


/o 


x 2 y + y 3 / 3 


b(a—x)/a 


= dxa 
Jo 

b ( a 4 

= a 


x 2 -(a - x) + - (- ) {a — xY 
a 3 V a 


a \ 3 


1 ( b 3 


-hr-v 


,4X1 


3 Va 3 4 


= ~^a(ba 3 + ab 3 ) = ^ (a 2 + & 2 ) 
iz b 


The dimensions are correct. For another check take the case where a = 0, reducing this to Mb 2 / 6. But wait, this now 
looks like a thin rod, and I remember that the moment of inertia of a thin rod about its end is Mb 2 / 3. What went 
wrong? Nothing. Look again more closely. Show why this limiting answer ought to be less than Mb 2 / 3. 


Volume of a Sphere 

What is the volume of a sphere of radius R1 The most obvious approach would be to use spherical coordinates. See 
problem 8.16 for that. I'll use cylindrical coordinates instead. The element of volume is dV = rdrd(pdz, and the 
integrals can be done a couple of ways. 



+\/R 2 ~ r 2 
-VR 2 -r 2 


dz 



r dr 


(8.21) 


You can finish these now, see problem 8.17. 

A Surface Charge Density 

An example that appears in electrostatics: The surface charge density, dq/dA, on a sphere of radius R is a(6,(j)) = 
ao sin 2 9 cos 2 (j). What is the total charge on the sphere? 

The element of area is R 2 sinO dO d(j), so the total charge is f a dA, 

fix r-2ix r+1 r2n 

Q= sin 6 d6 R 2 / dcj) ao sin 2 6 cos 2 </> = R 2 / <icos0cro(l — cos 2 6) / dcf) cos 2 </> 

Jo Jo j-l Jo 


The mean value of cos 2 is 1/2. so the (f) integral gives 7t. For the rest, it is 


cr 0 7ri? 2 


cos 6 — - cos 3 9 
3 


+i 


J -l 


= -0-ovri?- 
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Limits of Integration 

Sometimes the trickiest part of multiple integrals is determining the limits of integration. Especially when you have to 
change the order of integration, the new limits may not be obvious. Are there any special techniques or tricks to doing 
this? Yes, there is one, perhaps obscure, method that you may not be accustomed to. 

Draw Pictures. 

If you have an integral such as the first one, you have to draw a picture of the integration domain to switch limits. 




f x 

dx dy + 
Jo 





( 8 . 22 ) 


Of course, once you've drawn the picture you may realize that simply interchanging the order of integration won’t help, 
but that polar coordinates may. 



8.9 Vectors: Cylindrical, Spherical Bases 

When you describe vectors in three dimensions are you restricted to the basis x, y, zl In a different coordinate 
system you should use basis vectors that are adapted to that system. In rectangular coordinates these vectors have the 
convenient property that they point along the direction perpendicular to the plane where the corresponding coordinate is 
constant. They also point in the direction in which the other two coordinates are constant. E.g. the unit vector x points 
perpendicular to the plane of constant x (the y-z plane); it also point along the line where y and z are constant. 


f 
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Do the same thing for cylindrical coordinates. The unit vector z points perpendicular to the x-y plane. The 
unit vector f points perpendicular to the cylinder r = constant. The unit vector 0 points perpendicular to the plane 
0 = constant and along the direction for which r and z are constant. The conventional right-hand rule specifies z = fxcj). 

For spherical coordinates f points perpendicular to the sphere r = constant. The 0 vector is perpendicular to the 
plane 0 = constant and points along the direction where r = constant and 9 = constant and toward increasing coordinate 
0. Finally 9 is perpendicular to the cone 9 = constant and again, points toward increasing 9. Then 0 = f x 9, and on 
the Earth, these vectors f, 9, and 0 are up, South, and East. 

Solenoid 

A standard solenoid is cylindrical coil of wire, so that when the wire carries a current it produces a magnetic field. To 
describe this field, it seems that cylindrical coordinates are advised. Until you know something about the field the most 
general thing that you can write is 

B{r, 0, z) = r B r (r , 0, z) + 4>B^{r, 0, z) + zB z (r, 0, z) 

In a real solenoid that's it; all three of these components are present. If you have an ideal, infinitely long solenoid, 

with the current going strictly around in the 0 direction, (found only in textbooks) the use of Maxwell's equations and 

appropriately applied symmetry arguments will simplify this to 2B z (r). 

Gravitational Field 

The gravitational field of the Earth is simple, g = —fGM /r 2 , pointing straight toward the center of the Earth. Well no, 
not really. The Earth has a bulge at the equator; its equatorial diameter is about 43 km larger than its polar diameter. 
This changes the g-field so that it has a noticeable 9 component. At least it’s noticeable if you’re trying to place a 
satellite in orbit or to send a craft to another planet. 

A better approximation to the gravitational field of the Earth is 

g = [r (3 cos 2 9 — l)/2 + #cos#sin$] (8.23) 

The letter Q stands for the quadrupole moment. \Q\ <C M K 2 , and it’s a measure of the bulge. By convention a football 

(American football) has a positive Q\ the Earth’s Q is negative. (What about a European football?) 

Nuclear Magnetic Field 

The magnetic field from the nucleus of many atoms (even as simple an atom as hydrogen) is proportional to 

-4r [2 f cos 9 + 9 sin 9] 


(8.24) 
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As with the preceding example these are in spherical coordinates, and the component along the (p direction is zero. This 
field's effect on the electrons in the atom is small but detectable. The magnetic properties of the nucleus are central to 
the subject of nuclear magnetic resonance (NMR), and that has its applications in magnetic resonance imaging* (MRI). 

8.10 Gradient in other Coordinates 

The equation for the gradient computed in rectangular coordinates is Eq. (8.15) or (8.18). How do you compute it in 
cylindrical or spherical coordinates? You do it the same way that you got Eq. (8.15) from Eq. (8.13). The coordinates 
r, <p, and z are just more variables, so Eq. (8.13) is simply 

df = df(r,(f), z,dr,d(p,dz) = c (8.25) 

V c)r J (Ikz \ d( PJr,z \ dz J r ,0 

All that’s left is to write df in these coordinates, just as in Eq. (8.15). 

df = f dr + (pr dtp + zdz (8.26) 

The part in the (p direction is the displacement of df in that direction. As <p changes by a small amount the distance 
moved is not d<p\ it is rd(f>. The equation 

df = df(r, <p, z, dr, dtp, dz) = grad / ■ df 

combined with the two equations (8.25) and (8.26) gives grad / as 

g rad f =f % + ^% +2 % =V f (8 ' 2?) 

Notice that the units work out right too. 

In spherical coordinates the procedure is identical. All that you have to do is to identify what df is. 

df= fdr + Or dO + (prsm.Qd(p 

Again with this case you have to look at the distance moved when the coordinates changes by a small amount. Just as 
with cylindrical coordinates this determines the gradient in spherical coordinates. 

grad / = M- + = V / ( 8 ‘ 28 ) 

or r o6 r sin 9 dtp 

The equations (8.15), (8.27), and (8.28) define the gradient (and correspondingly V) in three coordinate systems. 


* In medicine MRI was originally called NMR, but someone decided that this would disconcert the patients. 
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8.11 Maxima, Minima, Saddles 

With one variable you can look for a maximum or a minimum by taking a derivative and setting it to zero. For several 
variables you do it several times so that you will get as many equations as you have unknown coordinates. 

Put this in the language of gradients: V/ = 0. The derivative of / vanishes in every direction as you move from 
such a point. As examples, 

f(x,y)=x 2 +y 2 , or =—x 2 —y 2 , or = x 2 — y 2 


For all three of these the gradient is zero at (. x,y ) = (0,0); the first has a minimum there, the second a maximum, and 
the third neither — it is a “saddle point.” Draw a picture to see the reason for the name. The generic term for all three 
of these is “critical point.” 

An important example of finding a minimum is “least square fitting” of functions. How close are two functions 
to each other? The most commonly used, and in every way the simplest, definition of the distance (squared) between / 
and g on the interval a < x < b is 

r b 

/ dx\f(x) ~g(x)\ 2 (8.29) 

J a 

This means that a large deviation of one function from the other in a small region counts more than smaller deviations 
spread over a larger domain. The square sees to that. As a specific example, take a function / on the interval 0 < x < L 
and try to fit it to the sum of a couple of trigonometric functions. The best fit will be the one that minimizes the distance 
between / and the sum. (Take / to be a real-valued function for now.) 

D 2 (a,/3) = J dx ^f(x) — a sin ^ — f3s'm (8.30) 

D is the distance between the given function and the sines used to fit it. To minimize the distance, take derivatives with 
respect to the parameters a and (3. 


dD 2 

da 



dP 2 

~w 
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f(x) — a sin — f3 sin 

1j 



7 TX 


f(x) — a sin — (3 sin 

1j 




= o 


= 0 
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These two equations determine the parameters a and (3. 

rL 


a 


7 TX 


dx sin 2 — 


/3 dx sin" 


2nx 



dx f(x) sin 
dx f(x) sin 


7 TX 

~L 
27 TX 
~L~ 


The other integrals vanish because of the orthogonality of simrx/L and sin27nr/L on this interval. What you get is 
exactly the coefficients of the Fourier series expansion of /. The Fourier series is the best fit (in the least square sense) 
of a sum of orthogonal functions to /. See section 11.6 for more on this 

Is it a minimum? Yes. Look at the coefficients of a 2 and /3 2 in Eq. (8.30). They are positive; +cr 2 + /3 2 has a 
minimum, not a maximum or saddle point, and there is no cross term in a/3 to mess it up. 

The distance function Eq. (8.29) is simply (the square of) the norm in the vector space sense of the difference of 
the two vectors / and g. Equations(6.12) and (6.7) here become 


\\f-9\\ 2 


< f-9,f-g)= dx\f(x)-g(x)\ 

J a 



The geometric meaning of Eq. (8.30) is that e± and e *2 provide a basis for the two dimensional space 


n _ t . irx n . 27 tx 
a e\ + p e 2 = a sm — + p sin — — 

1j Ju 


The plane is the set of all linear combinations of the two vectors, and for a general vector not in this plane, the shortest 
distance to the plane defines the vector in the plane that is the best fit to the given vector. It's the one that's closest. 
Because the vectors e*i and (?2 are orthogonal it makes it easy to find the closest vector. You require that the difference, 
V — ae\ — [3 62 has only an e *3 component. That is Fourier series. 

Flessian 

In this example leading to Fourier components, it's pretty easy to see that you are dealing with a minimum and not 
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anything else. In other situations it may not be so easy. You may have a lot of variables. You may have complicated 
cross terms. Is x 2 + xy + y 2 a minimum at the origin? Is x 2 + 3 xy + y 2 l (Yes and No respectively.) 

When there’s just one variable there is a simple rule that lets you decide. Check the second derivative. If it's 
positive you have a minimum; if it’s negative you have a maximum. If it's zero you have more work to do. Is there a 
similar method for several variables? Yes, and I’ll show it explicitly for two variables. Once you see how to do it in two 
dimensions, the generalization to N is just a matter of how much work you’re willing to do (or how much computer time 
you can use). 

The Taylor series in two variables, Eq. (2.16), is to second order 

df 

+ — dx + 
ox 

Write this in a more compact notation in order to emphasize the important parts. 


f(x + dx,y + dy) = f{x,y ) 


H dv + - Ux 2 

dy V Bx 2 


+ 2 


a2f dxdy+^dy’- 


dxdy 


dy 2 


f(r + df) - f(f) = V/ ■ df + (df, H df) H 


The part with the gradient is familiar, and to have either a minimum or a maximum, that will have to be zero. The next 
term introduces a new idea, the Hessian, constructed from all the second derivative terms. Write these second order 
terms as a matrix to see what they are, and in order to avoid a lot of clumsy notation use subscripts as an abbreviation 
for the partial derivatives. 


(df, H df) = (dx dy) ( t xx 

\ Jyx 


fxy\ f dx\ 
fyyj \ d V ) 


where df=xdx + ydy 


(8.31) 


This matrix is symmetric because of the properties of mixed partials. How do I tell from this whether the function 
/ has a minimum or a maximum (or neither) at a point where the gradient of / is zero? Eq. (8.31) describes a function 
of two variables even after I’ve fixed the values of x and y by saying that V/ = 0. It is a quadratic function of dx 
and dy. Expressed in the language of vectors this says that / has a minimum if (8.31) is positive no matter what the 
direction of df is — H is positive definite. 

Pull back from the problem a step. This is a 2 x 2 symmetric matrix sandwiched inside a scalar product. 


a 

b 




h(x, y) = ( x y) 


(8.32) 
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Is h positive definite? That is, positive for all x, yl If this matrix is diagonal it’s much easier to see what is happening, 
so diagonalize it. Find the eigenvectors and use those for a basis. 


fa b\ f x\ _ , / x 
\b c) [yj ~ X {y 

A 2 — A (a + c) + ac — b 2 = 0 


requires 




(a + c) ± \J {a — c ) 2 + b 2 


/ 2 


(8.33) 


For the applications here all the a, b, c are the real partial derivatives, so the eigenvalues are real and the only question 
is whether the As are positive or negative, because they will be the (diagonal) components of the Hessian matrix in the 
new basis. If this is a double root, the matrix was already diagonal. You can verify that the eigenvalues are positive if 
a > 0, c > 0, and 4 ac > b 2 , and that will indicate a minimum point. 

Geometrically the equation z = h(x,y ) from Eq. (8.32) defines a surface. If it is positive definite the surface is a 
paraboloid opening upward. If negative definite it is a paraboloid opening down. The mixed case is a hyperboloid — a 
saddle. 

In this 2x2 case you have a quadratic formula to fall back on, and with more variables there are standard 
algorithms for determining eigenvalues of matrices, but I'll leave those to some other book. 

8.12 Lagrange Multipliers 

This is an incredibly clever method to handle problems of maxima and minima in several variables when there are 
constraints. 

An example: “What is the largest rectangle?” obviously has no solution, but “What is the largest rectangle 
contained in an ellipse?" does. 

Another: Particles are to be placed into states of specified energies. You know the total number of particles; you 
know the total energy. All else being equal, what is the most probable distribution of the number of particles in each 
state? 

I'll describe this procedure for two variables; it's the same for more. The problem stated is that I want to find 
the maximum (or minimum) of a function f(x,y) given the fact that the coordinates x and y must lie on the curve 
(j)(x,y) = 0. If you can solve the 0 equation for y in terms of x explicitly, then you can substitute it into / and turn it 
into a problem in ordinary one variable calculus. What if you can’t? 

Analyze this graphically. The equation (j)(x,y) = 0 represents one curve in the plane. The succession of equations 
f(x,y) = constant represent many curves in the plane, one for each constant. Think of equipotentials. 
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Look at the intersections of the 0-curve and the /-curves. Where they intersect, they will usually cross each other. 
Ask if such a crossing could possibly be a point where / is a maximum. Clearly the answer is no, because as you move 
along the 0-curve you're then moving from a point where / has one value to where it has another. 

The one way to have / be a maximum at a point on the 0-curve is for the two curves to touch and not to cross. 
When that happens the values of / will increase as you approach the point from one side and decrease on the other. 
That makes it a maximum. In this sketch, the values of / decrease from 4 to 3 to 2 and then back to 3, 4, and 5. This 
point where the curve / = 2 touches the 0 = 0 curve is then a minimum of / along 0 = 0. 

To implement this picture so that you can compute with it, look at the gradient of / and the gradient of 0. The 
gradient vectors are perpendicular to the curves / =constant and 0 =constant respectively, and at the point where the 
curves are tangent to each other these gradients are in the same direction (or opposite, no matter). Either way one 
vector is a scalar times the other. 

V/ = AV0 (8.34) 

In the second picture, the arrows are the gradient vectors for / and for 0. Break this into components and you have 


d / _ ,90 

dx dx ’ 


df ,<90 
dy dy 


= 0 , 


<t>{?,y) = 0 


There are three equations in three unknowns (x : y, A), and these are the equations to solve for the position of the 
maximum or minimum value of /. You are looking for x and y, so you’ll be tempted to ignore the third variable A and 
to eliminate it. Look again. This parameter, the Lagrange multiplier, has a habit of being significant. 

Examples of Lagrange Multipliers 

The first example that I mentioned: What is the largest rectangle that you can inscribe in an ellipse? Let the ellipse 
and the rectangle be centered at the origin. The upper right corner of the rectangle is at (x,y), then the area of the 
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rectangle is 


Area =f(x,y) = 4xy, 


x 2 . y 2 


with constraint (ft(x,y) = -^ + tk — 1 = 0 

a 2 tr 


The equations to solve are now 


V(/-A0) = O, 


and 


= 0 , 


Ay- A^f = 0, 4x-A^ = 0, 

a 2 b 2 



which become 

x 2 y 2 

~ 2 + J2 ~ 1 = 0 

a 2 b 2 


(8.35) 


The solutions to these three equations are straight-forward. They are x = ajy/ 2, y = b/y/l, A = ‘lab. The maximum 
area is then 4 xy = 2 ab. The Lagrange multiplier turns out to be the required area. Does this reduce to the correct 
result for a circle? 

The second example said that you have several different allowed energies, typical of what happens in quantum 
mechanics. If the total number of particles and the total energy are given, how are the particles distributed among the 
different energies? 

If there are N particles and exactly two energy levels, E\ and E^, 


N = ni + ri 2 , and E = ri\E\ + ^E? 


you have two equations in two unknowns and all you have to do is solve them for the numbers ri\ and n^, the number 
of particles in each state. If there are three or more possible energies the answer isn’t uniquely determined by just two 
equations, and there can be many ways that you can put particles into different energy states and still have the same 
number of particles and the same total energy. 

If you’re dealing with four particles and three energies, you can perhaps count the possibilities by hand. How 
many ways can you put four particles in three states? (400), (310), (301), (220), 211), etc. There’s only one way to 
get the (400) configuration: All four particles go into state 1. For (310) there are four ways to do it; any one of the four 
particles can be in the second state and the rest in the first. Keep going. If you have 10 20 particles you have to find a 
better way. 

If you have a total of N particles and you place n i of them in the first state, the number of ways that you can do 
that is N for the first particle, (N — 1) for the second particle, etc. = N(N — 1)(N — 2) ■ ■ ■ (N — n\ + 1) = TV!/ {N — n{)\. 
This is over-counting because you don’t care which one went into the first state first, just that it’s there. There are n\\ 
rearrangements of these ri\ particles, so you have to divide by that to get the number of ways that you can get this 
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number of particles into state 1: N\/ni\(N — n{)\ For example, N = 4, n\ = 4 as in the (400) configuration in the 
preceding paragraph is 41/014! = 1, or 4!/3!l! = 4 as in the (310) configuration. 

Once you’ve got n\ particles into the first state you want to put into the second state (out of the remaining 
N — ni). Then on to state 3. 

The total number of ways that you can do this is the product of all of these numbers. For three allowed energies 

N\ {N -m)! (IV -m — n 2 )! = N\ 

ni\(N — n{)\ n 2 \{N — n\ — n 2 )\ n 3 ! {N — n\ — n 2 — 77-3) ! ni^lns! 

There's a lot of cancellation and the final factor in the denominator is one because of the constraint 77 1 + 772 + n- 3 = N . 

Lacking any other information about the particles, the most probable configuration is the one for which Eq. (8.36) 
is a maximum. This calls for Lagrange multipliers because you want to maximize a complicated function of several 
variables subject to constraints on N and on E. Now all you have to do is to figure out out to differentiate with respect 
to integers. Answer: If N is large you will be able to treat these variables as continuous and to use standard calculus to 
manipulate them. 

For large n, recall Stirling's formula, Eq. (2.20), 

77 ! ~ v/27 rnn n e~ n or its log: ln(n!) ~ In V 2 nn + n In n — n (8.37) 

This, I can differentiate. Maximizing (8.36) is the same as maximizing its logarithm, and that's easier to work with. 

maximize / = ln(iV!) — ln(ni !) — In ( 712 !) — 111 ( 713 !) 
subject to 77 1 + ?72 + 773 = N and n\E\ + 772-E2 + ^73^3 = E 

There are two constraints here, so there are two Lagrange multipliers. 


v(/ - Ai(t7i + 772 + 773 - N) - A 2 (77i£i + n 2 E 2 + n 3 E 3 - E)) = 0 

For /, use Stirling's approximation, but not quite. The term In \J 27m is negligible. For 77 as small as 10 6 , it is about 
6 x 10“ ‘ of the whole. Logarithms are much smaller than powers. That means that I can use 

V ( - n £ In (77^) + 77 £ ) - Ai 77 £ - X 2 n £ E^j = 0 
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This is easier than it looks because each derivative involves only one coordinate. 

d 

> — lnni — 1 + 1 — Ai — A 2 Ei = 0, etc. 

< 777-1 

This is 

n e = e~ Xl ~ X2Ei , £ = 1,2,3 

There are two unknowns here, Ai and A2- There are two equations, for N and E, and the parameter Ai simply determines 
an overall constant, e~ Xl = C . 


Cj2e~ x ^=N, 
1 = 1 


and Cj2 E ee~ X2E t = E 
1 = 1 


The quantity A 2 is usually denoted (3 in this type of problem, and it is related to temperature by (3 = l/kT where 
as usual the Lagrange multiplier is important on its own. It is usual to manipulate these results by defining the “partition 
function” 

3 

Z{/3) = J2 e ~ PEl (8-38) 

1 = 1 

In terms of this function Z you have 


C = N/Z, and E = - — - m (8.39) 

Zj dp 

For a lot more on this subject, you can refer to any one of many books on thermodynamics or statistical physics. There 
for example you can find the reason that (3 is related to the temperature and how the partition function can form the 
basis for computing everything there is to compute in thermodynamics. Especially there you will find that more powerful 
versions of the same ideas will arise when you allow the total energy and the total number of particles to be variables 
too. 

8.13 Solid Angle 

The extension of the concept of angle to three dimensions is called “solid angle.” To explain what this is, I'll first show 
a definition of ordinary angle that’s different from what you’re accustomed to. When you see that, the extension to one 
more dimension is easy. 
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Place an object in the plane somewhere not at the origin. You are at the origin and look at it. I want a definition 
that describes what fraction of the region around you is spanned by this object. For this, draw a circle of radius R 
centered at the origin and draw all the lines from everywhere on the object to the origin. These lines will intersect the 
circle on an arc (or even a set of arcs) of length s. Define the angle subtended by the object to be 6 = s/R. 




Now step up to three dimensions and again place yourself at the origin. This time place a sphere of radius R 
around the origin and draw all the lines from the three dimensional object to the origin. This time the lines intersect the 
sphere on an area of size A. Define the solid angle subtended by the object to be D = A/R 2 . (If you want four or more 
dimensions, see problem 8.52.) 

For the circle, the circumference is 27 tR, so if you're surrounded, the angle subtended is 2nR/R = 2ir radians. For 
the sphere, the area is 47 tR 2 , so this time if you’re surrounded, the solid angle subtended is 47t R 2 / R 2 = 4n sterradians. 
That is the name for this unit. 

All very pretty. Is it useful? Only if you want to describe radiative transfer, nuclear scattering, illumination, the 
structure of the atom, or rainbows. Except for illumination, these subjects center around one idea, that of a "cross 
section.” 

Cross Section, Absorption 

Before showing how to use solid angle to describe scattering, I’ll take a simpler example: absorption. There is a hole 
in a wall and I propose to measure its area. Instead of taking a ruler to it I blindly fire bullets at the wall and see how 
many go in. The bigger the area, the larger the fraction that will go into the hole of course, but I have to make this 
quantitative to make it useful. 

Define the flux of bullets: / = (IN / ( dt dA ). That is, suppose that I’m firing all the bullets in the same direction, 
but not starting from the same place. Pick an area A A perpendicular to the stream of bullets and pick a time interval 
At. How many bullets pass through this area in this time? AN, and that's proportional to both A A and At. The limit 
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of this quotient is the flux. 



AN 


At.^o AtAA 

AA^O 


/ 


(8.40) 


Having defined the flux as a kind of density, call the (unknown) area of the hole a. The rate at which these bullets enter 
the hole is proportional to the size of the hole and to the flux of bullets, R = fa, where R is the rate of entry and a 
is the area of the hole. If I can measure the rate of absorption R and the flux /, I have measured the area of the hole, 
a = R/ f. This letter is commonly used for cross sections. 

Why go to this complicated trouble for a hole? I probably shouldn’t, but to measure absorption of neutrons hitting 
nuclei this is precisely what you do. I can't use a ruler on a nucleus, but I can throw things at it. In this example, neutron 
absorption by nuclei, the value of the measured absorption cross section can vary from millibarns to kilobarns, where a 
barn is 10” 24 cm 2 . The radii of nuclei vary by a factor of only about six from hydrogen through uranium (\/238 = 6.2), 
so the cross section measured by bombarding the nucleus has little to do with the geometric area nr 2 . It is instead a 
measure of interaction strength 

Cross Section, Scattering 

There are many types of cross sections besides absorption, and the next simplest is the scattering cross section, especially 
the differential scattering cross section. 
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The same flux of particles that you throw at an object may not be absorbed, but may scatter instead. You detect 
the scattering by using a detector. (You were expecting a catcher's mitt?) The detector will have an area A A facing 
the particles and be at a distance r from the center of scattering. The detection rate will be proportional to the area of 
the detector, but if I double r for the same A A, the detection rate will go down by a factor of four. The detection rate 
is proportional to A A/r 2 , but this is just the solid angle of the detector from the center: 

An = A A/r 2 (8.41) 


The detection rate is proportional to the incoming flux and to the solid angle of the detector. The proportionality is an 
effective scattering area, Act. 


A R = fAa, 


da dR 

dn fdn 


(8.42) 


This is the differential scattering cross section. 

You can compute this if you know something about the interactions involved. The one thing that you need is the 
relationship between where the particle comes in and the direction in which it leaves. That is, the incoming particle is 
aimed to hit at a distance b (called the impact parameter) from the center and it scatters at an angle 9, called of course 
the scattering angle, from its original direction. Particles that come in at distance between b and b + db from the axis 
through the center will scatter into directions between 9 and 9 + d9. 

The cross section for being sent in a direction between these two angles is the area of the ring: da = 2nbdb. 
Anything that hits in there will scatter into the outgoing angles shown. How much solid angle is this? Put the z-axis 
of spherical coordinates to the right, so that 9 is the usual spherical coordinate angle from z. The element of area on 
the surface of a sphere is dA = r 2 sin9d9d(j), so the integral over all the azimuthal angles (j) around the ring just gives 
a factor 27t. The element of solid angle is then 


d A 

dn = — 7 T- = 27t sin 9d9 


As a check on this, do the integral over all theta to get the total solid angle around a point, verifying that it is 47t. 

Divide the effective area for this scattering by the solid angle, and the result is the differential scattering cross 
section. 

da 2nbdb b db 
dn 2n sin 9 d9 sin 9 d9 

If you have 9 as a function of b, you can compute this. There are a couple of very minor modifications that you need in 
order to complete this development. The first is that the derivative db/d9 can easily be negative, but both the area and 
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the solid angle are positive. That means that you need an absolute value here. One other complication is that one value 
of 6 can come from several values of b. It may sound unlikely, but it happens routinely. It even happens in the example 
that comes up in the next section. 


da 

dCl 


E bi dbi 
sin 9 d6 


(8.43) 


The differential cross section often becomes much more involved than this, especially the when it involves nuclei 
breaking up in a collision, resulting in a range of possible energies of each part of the debris. In such collisions particles 
can even be created, and the probabilities and energy ranges of the results are described by their own differential cross 
sections. You will wind up with differential cross sections that look like da/dQi d &2 ■ ■ ■ dE\ dE^ . . .. These rapidly 
become so complex that it takes some elaborate computer programming to handle the information. 


8.14 Rainbow 

An interesting, if slightly complicated example is the rainbow. Sunlight scatters from small drops of water in the air and 
the detector is your eye. The water drops are small enough that I’ll assume them to be spheres, where surface tension is 
enough to hold them in this shape for the ordinary small sizes of water droplets in the air. The first and simplest model 
uses geometric optics and Snell's law to figure out where the scattered light goes. This model ignores the wave nature 
of light and it does not take into account the fraction of the light that is transmitted and reflected at each surface. 



sin f3 = n sin a 

6 = ({3 — a) + (7t — 2 a) + ((3 — a) 
b = R sin / 3 

(8.44) 


The light comes in at the indicated distance b from the axis through the center of the sphere. It is then refracted, 
reflected, and refracted. Snell’s law describes the first and third of these, and the middle one has equal angles of incidence 
and reflection. The dashed lines are from the center of the sphere. The three terms in Eq. (8.44) for the evaluation of 
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9 come from the three places at which the light changes direction, and they are the amount of deflection at each place. 
The third equation simply relates b to the radius of the sphere. 

From these three equations, eliminate the two variables a and (3 to get the single relation between b and 9 that 
I’m looking for. When you do this, you find that the resulting equations are a bit awkward. It's sometimes easier to use 
one of the two intermediate angles as a parameter, and in this case you will want to use (3. From the picture you know 
that it varies from zero to 7t/2. The third equation gives b in terms of (3. The first equation gives a in terms of /3. The 
second equation determines 9 in terms of (3 and the a that you've just found. 

The parametrized relation between b and 9 is then 


b = R sin /?, 


9 


n + 2/3 — 4: sin 1 



(0 </3 < vr/2) 


or you can carry it through and eliminate f3. 


( 8 . 45 ) 


9 


ti + 2 sin 




( 8 . 46 ) 


The derivative db/d9 = 1 J[d9/db\. Compute this. 

d9 _ 2 4 

db ~ Vi? 2 - b 2 ~ Vn 2 R 2 - b 2 

In the parametrized form this is 

db db/d/3 Rcos/3 

d9 d9 / d(3 2 — 4 cos (3/ i/n^^sin 2 "^ 


( 8 . 47 ) 


In analyzing this, it's convenient to have both forms, as you never know which one will be easier to interpret. (Have you 
checked to see if they agree with each other in any special cases?) 
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These graphs are generated from Eq. (8.45) for eleven values of the index of refraction equally spaced from 1 to 
1.5, and the darker curve corresponds to n = 1.3. The key factor that enters the cross-section calculation, Eq. (8.43), 
is db/dd, because it goes to infinity when the curve has a vertical tangent. For water, with n = 1.33, the b-6 curve has 
a vertical slope that occurs for 6 a little less than 140°. That is the rainbow. 

To complete this I should finish with dcr/dTl. The interesting part of the problem is near the vertical part of the 
curve. To see what happens near such a point use a power series expansion near there. Not b(6) but 0(b). This has zero 
derivative here, so near the vertical point 

6(b) = 6 »o + 7 ~ bp) 2 

At (bp, 6p), Eq. (8.47) gives zero and Eq. (8.46) tells you 6q. The coefficient 7 comes from the second derivative of 
Eq. (8.46) at bo- What is the differential scattering cross section in this neighborhood? 


b = bn ± 


\A»-0o)/7, 


SO 


db/dd = ± 


2 ^ 7 (0 - 0 o ) 


da 

dTl 


E bi dbi 

sin 6 d 6 

i 

bp + V( 0 - 0 q)/t 1 b o -y/( 0 - 6*q)/7 1 

sin 6 2^7 (6 - 6 p) sin 6 2^/7 (6 - Op) 

bp bp 

sin 6^7(6* - 6*0) sin 6*o ^7(6* - 0 o ) 


(8.48) 
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In the final expression, because this is near 9 — 6$ and because I’m doing a power series expansion of the exact solution 
anyway, I dropped all the ^-dependence except the dominant factors. This is the only consistent thing to do because 
I’ve previously dropped higher order terms in the expansion of 9(b). 

Why is this a rainbow? (1) With the sun at your back you see a bright arc of a circle in the direction for which 
the scattering cross-section is very large. The angular radius of this circle is n — 0$ « 42°. (2) The value of 9q depends 
on the index of refraction, n, and that varies slightly with wavelength. The variation of this angle of peak intensity is 

d9o _ d9o dbo dn 

dX ~ db 0 dn dX 1 J 

When you graph Eq. (8.48) note carefully that it is zero on the left of 9q (smaller 9) and large on the right. Large 
scattering angles correspond to the region of the sky underneath the rainbow, toward the center of the circular arc. This 
implies that there is much more light scattered toward your eye underneath the arc of the rainbow than there is above 
it. Look at your next rainbow and compare the area of sky below and above the rainbow. 

There's a final point about this calculation. I didn't take into account the fact that when light hits a surface, some 
is transmitted and some is reflected. The largest effect is at the point of internal reflection, because typically only about 
two percent of the light is reflected and the rest goes through. The cross section should be multiplied by this factor to 
be complete. The detailed equations for this are called the Fresnel formulas and they tell you the fraction of the light 
transmitted and reflected at a surface as a function of angle and polarization. 

This is far from the whole story about rainbows. Light is a wave, and the geometric optics approximation that 
I’ve used doesn’t account for everything. In fact Eq. (8.43) doesn't apply to waves, so the whole development has to be 
redone. To get an idea of some of the other phenomena associated with the rainbow, see for example 
www.usna.edu/Users/oceano/raylee/RainbowBridge/Chapter_8.html 
www.philiplaven.com/links.html 


Exercises 

1 For the functions f(x,y ) = Axy 2 sm(xy), x(t) = Ct 3 , y(t ) = Dt 2 , compute df / dt two ways. First use the chain 
rule, then do explicit substitution and compute it directly. 

2 Compute (df/dx) y and ( df/dy) x for 

(a) f{x,y) = x 2 - 2 xy + y 2 , 


(b) fix, y) = In (y/x), (c) fix, y) = (y + x)/(y - x) 
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3 Compute df / dx using the chain rule for 

(a) f(x,y) = Hy/x), y = x 2 1 (b) f[x,y) = (y + x)/(y-x), y = ax, 

(c) f(x, y) = sin (xy), y = l/x 

Also calculate the results by substituting y explicitly and then differentiating, comparing the results. 

4 Let f(x,y) = x 2 — 2 xy, and the polar coordinates are x = rcos0, y = rsin0. Compute 

fdf\ (df\ (df\ (df\ (df\ [df\ 

U),’ UJ*’ U) r ’ iH’ Uj*’ \dy) + 

5 Let f(x,y ) = x 2 — 2 xy, and the polar coordinates are x = rcos0, y = rsin0. Compute 

(df\ (df\ (df\ (df\ (df\ (df\ 

\dr)^ \dr ) x ’ 

6 For the function f(u,v) = w 3 — f 3 , what is the value at (u,v) = (2,1)? Approximately what is its value at 
(u, v ) = (2.01, 1.01)? Approximately what is its value at (w, v) = (2.01, 0.99)? 

7 Assume the Earth's atmosphere is uniform density and 10 km high, what is its volume? What is the ratio of this 
volume to the Earth’s volume? 

8 For a cube 1 m on a side, what volume of paint will you need in order to paint it to a thickness of 0.2 mm? Don't 
forget to paint all the sides. 

9 What is grad r 2 ? Do it in both rectangular and polar coordinates. Two dimensions will do. Are your results really the 
same? 

10 What is grad [ax 2 + (3y 2 ) . Do this in both rectangular and polar coordinates. For the polar form, put x and y in 
terms of r and 0, then refer to Eq. (8.27) for the polar form of the gradient. Finally, compare the two results. 

11 The Moon has a radius about 1740 km and its distance from Earth averages about 384 000 km from Earth. What 
solid angle does the Moon subtend from Earth? What solid angle does Earth (radius 6400 km) subtend from the Moon? 

12 Express the cylindrical unit vectors f, (f>, z in terms of the rectangular ones. And vice versa. 

13 Evaluate the volume of a sphere by integration in spherical coordinates. 
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Problems 

8.1 Let r = \J x 2 + y 2 , x = A sin ut, y = B coscut. Use the chain rule to compute the derivative with respect to t of 
e kr . Notice the various checks you can do on the result, verifying (or disproving) your result. 

8.2 Sketch these functions* in plane polar coordinates: 

(a)r = acos</> (b) r = aseccft (c) r = acj) (d )r = a/4> (e) r 2 = a 2 sm2(J) 

8.3 The two coordinates x and y are related by f(x,y ) = 0. What is the derivative of y with respect to x under 
these conditions? [What is df along this curve? And have you drawn a sketch?] Make up a test function (with enough 
structure to be a test but still simple enough to verify your answer independently) and see if your answer is correct. 
Ans: -(<9 f/dx)/(df/dy) 

8.4 If x = u + v and y = u — v, show that 

(dy\ = _( d v\ 

\dx) u \dx) v 

Do this by application of the chain rule, Eq. (8.6). Then as a check do the calculation by explicit elimination of the 
respective variables v and u. 

8.5 If x = r cos (j) and y = r sin 0, compute 



8.6 What is the differential of f(x, y, z ) = In (xyz). 

8.7 If f(x, y) = x 3 + y 3 and you switch to plane polar coordinates, use the chain rule to evaluate 


(9f\ 

( df \ 

( d 2 f\ 

(d 2 f\ 

( d 2 f \ 

\dr)A 

\d<f>J r ' 

W)* 

WVr’ 

\drd(J) J 


Check one or more of these by substituting r and (j) explicitly and doing the derivatives. 


* Seewww-groups.dcs.st-and.ac.uk/~history/Curves/Curves.html for more. 
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8.8 When current / flows through a resistance R the heat produced is I 2 R. Two terminals are connected in parallel by 
two resistors having resistance R\ and i? 2 - Given that the total current is divided as I = R + R, show that the condition 
that the total heat generated is a minimum leads to the relation I\R\ = RR.i- You don’t need Lagrange multipliers to 
solve this problem, but try them anyway. 

8.9 Sketch the magnetic field represented by Eq. (8.24). I suggest that you start by fixing r and drawing the .B-vectors 
at various values of 6. It will probably help your sketch if you first compute the magnitude of B to see how it varies 
around the circle. Recall, this field is expressed in spherical coordinates, though you can take advantage of its symmetry 
about the z-axis to make the drawing simpler. Don’t stop with just the field at fixed r as I suggested you begin. The 
field fills space, so try to describe it. 

8.10 Ad rumhead can vibrate in more complex modes. One such mode that vibrates at a frequency higher than that of 
Eq. (8.19) looks approximately like 


z(r, 4>,t) = Ar( 1 — r 2 /R 2 ) sin 0 cos 

(a) Find the total kinetic energy of this oscillating drumhead. 

(b) Sketch the shape of the drumhead at t = 0. Compare it to the shape of Eq. (8.19). 

At the instant that the total kinetic energy is a maximum, what is the shape of the drumhead? 

Ans: A 2 uj 2 R 4 sin 2 u) 2 t 

8.11 Just as there is kinetic energy in a vibrating drumhead, there is potential energy, and as the drumhead moves 
its total potential energy will change because of the slight stretching of the material. The potential energy density 
(dP.E./dA) in a drumhead is 

u p = \T{Vz ) 2 

T is the tension in the drumhead. It has units of Newtons/meter and it is the force per length you would need if you 
cut a small slit in the surface and had to hold the two sides of the slit together. This potential energy arises from the 
slight stretching of the drumhead as it moves away from the plane of equilibrium. 

(a) For the motion described by Eq. (8.19) compute the total potential energy. (Naturally, you will have checked the 
dimensions first to see if the claimed expression for u p is sensible.) 

(b) Energy is conserved, so the sum of the total potential energy and the total kinetic energy from Eq. (8.20) must be 
a constant. What must the frequency c o be for this to hold? Is this a plausible result? A more accurate result, from 
solving a differential equation, is 2.405^7" / oR 2 . Ans: \J 6 T / aR 2 = 2Ah\/T / oK 2 
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8.12 Repeat the preceding problem for the drumhead mode of problem 8.10. The exact result, calculated in terms of 
roots of Bessel functions is 3.832^/T /aR 2 . Ans: 4 y/T / aR 2 

8.13 Sketch the gravitational field of the Earth from Eq. (8.23). Is the direction of the field plausible? Draw lots of 
arrows. 

8.14 Prove that the unit vectors in polar coordinates are related to those in rectangular coordinates by 

f = x cos (j) + y sin 0, (f> = — x sin 0 + y cos (f) 

What are x and y in terms of f and 0? 

8.15 Prove that the unit vectors in spherical coordinates are related to those in rectangular coordinates by 

f = x sin 9 cos 4> + y sin 6 sin <f> + z cos 9 
9 = x cos 9 cos 4> + y cos 9 sin <f> — z sin 9 
(j) = — x sin <f> + y cos (j) 

8.16 Compute the volume of a sphere using spherical coordinates. Also do it using rectangular coordinates. Also do it 
in cylindrical coordinates. 

8.17 Finish both integrals Eq. (8.21). Draw sketches to demonstrate that the limits stated there are correct. 

8.18 Find the volume under the plane 2x + 2y + z = 8a and over the triangle bounded by the lines x = 0, y = 2a, and 

x = y in the x-y plane. Ans: 8a 3 

8.19 Find the volume enclosed by the doughnut-shaped surface (spherical coordinates) r = asinfC Ans: 7t 2 a 3 /4 

8.20 In plane polar coordinates, compute df /d(f), also d(f)/d(f). This means that r is fixed and you’re finding the change 
in these vectors as you move around a circle. In both cases express the answer in terms of the f-(j) vectors. Draw pictures 
that will demonstrate that your answers are at least in the right direction. Ans: dcp/dcf) = — r 

8.21 Compute the gradient of the distance from the origin (in three dimensions) in three coordinate systems and verify 
that they agree. 
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8.22 Taylor's power series expansion of a function of several variables was discussed in section 2.5. The Taylor series in 
one variable was expressed in terms of an exponential in problem 2.30. Show that the series in three variables can be 
written as 


e h ' v f(x,y,z ) 


8.23 The wave equation is (a) below. Change variables to z = x — vt and w = x + vt and show that in these coordinates 
this equation is (b) (except for a constant factor). Did you explicitly note which variables are kept fixed at each stage of 
the calculation? See also problem 8.53. 


d 2 u 1 d 2 u _ 
dx 2 


(b) 


d 2 u 

dzdw 


8.24 The equation (8.23) comes from taking the gradient of the Earth's gravitational potential in an expansion to terms 
in l/r 3 . 


V 


GM 

r 


GQ 

y* 3 


P 2 ( COS 9) 


where P 2 (cos# ) = § cos 2 # — ^ is the second order Legendre polynomial. Compute g = — VV. 


8.25 In problem 2.25 you computed the electric potential at large distances from a pair of charges, 
+q at z = a (r a). The result was 


V 


kqa 

r 2 


(cos 6) 


—q at the origin and 


where Pi (cos#) = cos# is the first order Legendre polynomial. Compute the electric field from this potential, E = — VV. 
And sketch it of course. 


8.26 In problem 2.26 you computed the electric potential at large distances from a set of three charges, —2 q at the 
origin and +q at z = ±« (r a). The result was 

V = k -Ep,( cos#) 


where P 2 (cos#) is the second order Legendre polynomial. Compute the electric field from this potential, E = — VV. 
And sketch it of course. 
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8.27 Compute the area of an ellipse having semi-major and semi-minor axes a and b. Compare your result to that of 
Eq. (8.35). Ans: 7 Tab 


8.28 Two equal point charges q are placed at z = ±a. The origin is a point of equilibrium; E = 0 there, (a) Compute 
the potential near the origin, writing V in terms of powers of x, y, and z near there, carrying the powers high enough 
to describe the nature of the equilibrium point. Is V maximum, minimum, or saddle point there? It will be easier if you 
carry the calculation as far as possible using vector notation, such as | r — az I - vw — az) 2 , and r <C a. 

(b) Write your result for V near the origin in spherical coordinates also. 


Ans: 


47reo a 


l + ^(ic°s 2 0-i) 


8.29 When current / flows through a resistance R the heat produced is I 2 R. Two terminals are connected in parallel 
by three resistors having resistance R\, R 2 , and R 3 . Given that the total current is divided as / = I\ + 12 + 13 , show 
that the condition that the total heat generated is a minimum leads to the relation I\R\ = I 2 R 2 = /3-R3. You can 
easily do problem 8.8 by eliminating a coordinate then doing a derivative. Here it’s starting to get sufficiently complex 
that you should use Lagrange multipliers. Does A have any significance this time? 


8.30 Given a right circular cylinder of volume V, what radius and height will provide the minimum total area for the 
cylinder. Ans: r = (V/27T) 1 / 3 , h = 2 r 

8.31 Sometimes the derivative isn’t zero at a maximum or a minimum. Also, there are two types of maxima and minima; 
local and global. The former is one that is max or min in the immediate neighborhood of a point and the latter is biggest 
or smallest over the entire domain of the function. Examine these functions for maxima and minima both inside the 
domains and on the boundary. 

|x|, (-1 < x < +2) 

To ( x 2 — y 2 ) /a 2 , (—a< x<a, — a < y < a ) 

Vo(r 2 / R 2 )P2(cos0), (r < R, 3 dimensions) 


8.32 In Eq. (8.39) it is more common to specify N and (3 = l/kT, the Lagrange multiplier, than it is to specify N and 
E, the total energy. Pick three energies, E%, to be 1, 2, and 3 electron volts, (a) What is the average energy, E/N , as 

/3 — » 00 (T — > 0 )? 

(b) What is the average energy as /3 — > 0? 

(c) What are m, ri 2 , and 123 in these two cases? 
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8.33 (a) Find the gradient of V, where V = V 0 (x 2 + y 2 + z 2 )a~ 2 e~V x2 +y 2 + z2 /“. (b) Find the gradient of V, where 

V = V 0 (x + y + z)a- 1 e-^ x+ y +z ^ a . 

8.34 A billiard ball of radius R is suspended in space and is held rigidly in position. Very small pellets are thrown at 
it and the scattering from the surface is completely elastic, with no friction. Compute the relation between the impact 
parameter b and the scattering angle 9. Then compute the differential scattering cross section da/dQ. 

Finally compute the total scattering cross section, the integral of this over dQ. 

8.35 Modify the preceding problem so that the incoming object is a ball of radius R\ and the fixed billiard ball has 
radius i? 2 - 

8.36 Find the differential scattering cross section from a spherical drop of water, but instead of Snell’s law, use a 
pre-Snell law: /3 = not, without the sines. Is there a rainbow in this case? Sketch da/dQ versus 6. 

Ans: R 2 sin 2/3 / [4sin0|l — 2/n|] , where 0 = 7t + 2(l — 2 /n)/3 

8.37 From the equation (8.43), assuming just a single b for a given 9, what is the integral over all dQ of da/dffi 

Ans: 7t6^ ax 

8.38 Solve Eq. (8.47) for b when dO / db = 0. For n = 1.33 what value of 6 does this give? 

8.39 If the scattering angle 9 = f sin(7T b/ R) for 0 < b < R, what is the resulting differential scattering cross section 

(with graph). What is the total scattering cross section? Start by sketching a graph of 9 versus b. 

Ans: 2R 2 /[it 2 sin 0 a/ 1 — (20 / 7r) 2 ] 

8.40 Find the signs of all the factors in Eq. (8.49), and determine from that whether red or blue is on the outside of 
the rainbow. Ans: Look 

8.41 If it suddenly starts to rain small, spherical diamonds instead of water, what happens to the rainbow? n = 2.4 for 
diamond. 

8.42 What would the rainbow look like for n = 2? You’ll have to look closely at the expansions in this case. For small 
b, where does the ray hit the inside surface of the drop? 

8.43 (a) The secondary rainbow occurs because there can be two internal reflections before the light leave the drop. 
What is the analog of Eqs. (8.44) for this case? (b) Repeat problems 8.38 and 8.40 for this case. 
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8.44 What is the shortest distance from the origin to the plane defined by A-{f— fo) = 0? Do this using Lagrange 
multipliers, and then explain why of course the answer is correct. 

8.45 The U.S. Post Office has decided to use a norm like Eq. (6.11)(2) to measure boxes. The size is defined to be the 
sum of the height and the circumference of the box, and the circumference is around the thickest part of the package: 
"length plus girth.” What is the maximum volume you can ship if this size is constrained to be less than 130 inches? For 
this purpose, assume the box is rectangular, not cylindrical, though you may expect the cylinder to improve the result. 
Assume that the box’s dimensions are a, a, b, with volume a 2 b. 

(a) Show that if you assume that the girth is 4a, then you will conclude that b > a and that you didn't measure the 
girth at the thickest part of the package. 

(b) Do it again with the opposite assumption, that you assume b is big so that the girth is 2b + 2a. Again show that it 
is a contradiction. 

(c) You have two inequalities that you must satisfy: girth plus length measured one way is less than L = 130 inches and 
girth plus length measured the other way is too. That is, 4 a + b < L and 3a + 2b < L. Plot these regions in the a-b 
plane, showing the allowed region in a-b space. Also plot some curves of constant volume, V = a 2 b. Show that the 
point of maximum volume subject to these constraints is on the edge of this allowed region, and that it is at the corner 
of intersection of the two inequalities. This is the beginning of the subject called “linear programming.” 

Ans: a cube 

8.46 Plot 6 versus b in equation (8.45) or (8.46). 

8.47 A disk of radius R is at a distance c above the x-y plane and parallel to that plane. What is the solid angle that 
this disk subtends from the origin? Ans: 27t[l — c/y/c 2 + R 2 J 

8.48 Within a sphere of radius R, what is the volume contained between the planes defined by z = a and z = bl 
Ans: n(b — a)(R 2 — \{b 2 + ab + a 2 )) 

8.49 Find the mean-square distance, y J r 2 dV , from a point on the surface of a sphere to points inside the sphere. 
Note: Plan ahead and try to make this problem as easy as possible. Ans: 8R 2 /5 

8.50 Find the mean distance, y f r dV, from a point on the surface of a sphere to points inside the sphere. Unlike the 
preceding problem, this requires some brute force. Ans: 6R/5 

8.51 A volume mass density is specified in spherical coordinates to be 


p(r, 9 , </>) = po(l + f 2 / R 2 ) [l + \ cos 9 sin 2 (j) + \ cos 2 9 sin 3 </>] 
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Compute the total mass in the volume 0 < r < R. Ans: 327tpo-R 3 /l5 

8.52 The circumference of a circle is some constant times its radius ( C\r ). For the two-dimensional surface that is a 
sphere in three dimensions the area is of the form C^r 2 . Start from the fact that you know the integral J ^ dxe~ x = 
7 T 1 / 2 and write out the following two dimensional integral twice. It is over the entire plane. 

J dAe~ r2 using dA = dxdy and using dA = C\rdr 

From this, evaluate C Repeat this for dV and C^r 2 in three dimensions, evaluating C 2 . 

Now repeat this in arbitrary dimensions to evaluate C n . Do you need to reread chapter one? In particular, what is C 3 ? 
It tells you about the three dimensional hypersphere in four dimensions. From this, what is the total “hypersolid angle" 
in four dimensions (like 4n in three)? Ans: 27t 2 

8.53 Do the reverse of problem 8.23. Start with the second equation there and change variables to see that it reverts 
to a constant times the first equation. 

8.54 Carry out the interchange of limits in Eq. (8.22). Does the drawing really represent the integral? 

8.55 Is x 2 + xy + y 2 a minimum or maximum or something else at (0, 0)? Do the same question for x 2 + 2 xy + y 2 
and for x 2 + 3 xy + y 2 . Sketch the surface z = f(x,y) in each case. 

8.56 Derive the conditions stated after Eq. (8.33), expressing the circumstances under which the Hessian matrix is 
positive definite. 

8.57 In the spirit of problems 8.10 et seq. what happens if you have a rectangular drumhead instead of a circular one? 
Let 0 < x < a and 0 < y < b. The drumhead is tied down at its edges, so an appropriate function that satisfies these 
conditions is 

z(x,y) = Asm(mrx/a)sm(rmry/b)coscut 

Compute the total kinetic and the total potential energy for this oscillation, a function of time. For energy to be 
conserved the total energy must be a constant, so compute the frequency cu for which this is true. As compared to the 
previous problems about a circular drumhead, this turns out to give the exact results instead of only approximate ones. 

Ans:o ; 2 = 7 r 2 £[g + ^] 

8.58 Repeat problem 8.45 by another method. Instead of assuming that the box has a square end, allow it to be any 
rectangular box, so that its volume is V = abc. Now you have three independent variables to use, maximizing the volume 
subject to the post office’s constraint on length plus girth. This looks like it will have to be harder. Instead, it’s much 
easier. Draw pictures! Ans: still a cube 
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8.59 An asteroid is headed in the general direction of Earth, and its speed when far away is Vo 
relative to the Earth. What is the total cross section for it’s hitting Earth? It is not necessary to 
compute the complete orbit; all you have to do is use a couple of conservation laws. Express the 
result in terms of the escape speed from Earth. 

Ans: a = nR 2 {l + (v esc /vo) 2 ) 

8.60 In three dimensions the differential scattering cross section appeared in Eqs. (8.42) and (8.43). If the world were 
two dimensional this area would be a length instead. What are the two corresponding equations in that case, giving you 
an expression for d£/d6. Apply this to the light scattering from a (two dimensional) drop of water and describe the 
scattering results. For simplicity this time, assume the pre-Snell law as in problem 8.36. 

8.61 As in the preceding problem, but use the regular Snell law instead. 

8.62 This double integral is over the isosceles right triangle in the figure. The function to be 

integrated is f{t') = crt' 3 , BUT FIRST, set it up for an arbitrary f(t') and then set it up again but ^ 
with the order of integration reversed. In one of the two cases you should be able to do one integral 
without knowing /. Having done this, apply your two results to this particular / as a test case that 
your work was correct. In the figure, V and t" are the two coordinates and t is the coordinate of the 
top of the triangle. _ 




Vector Calculus 1 


The first rule in understanding vector calculus is draw lots of pictures. This subject can become rather abstract if you let 
it, but try to visualize all the manipulations. Try a lot of special cases and explore them. Keep relating the manipulations 
to the underlying pictures and don't get lost in the forest of infinite series. Along with the pictures, there are three types 
of derivatives, a couple of types of integrals, and some theorems relating them. 

9.1 Fluid Flow 

When water or any fluid moves through a pipe, what is the relationship between the motion of the fluid and the total 
rate of flow through the pipe (volume per time)? Take a rectangular pipe of sides a and b with fluid moving at constant 
speed through it and with the velocity of the fluid being the same throughout the pipe. It's a simple calculation: In time 
At the fluid moves a distance vAt down the pipe. The cross-section of the pipe has area A = ab, so the volume that 
move past a given flat surface is AV = AvAt. The flow rate is the volume per time, AV / At = Av. (The usual limit 
as At — > 0 isn’t needed here.) 


A 


vAt 


(a) 



vAt 


(b) 


Just to make the problem look a little more involved, what happens to the result if I ask for the flow through a 
surface that is tilted at an angle to the velocity. Do the calculation the same way as before, but use the drawing (b) 
instead of (a). The fluid still moves a distance vAt, but the volume that moves past this flat but tilted surface is not 
its new (bigger) area A times vAt. The area of a parallelogram is not the product of its sides and the volume of a 
parallelepiped is not the area of a base times the length of another side. 

vAt 
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The area of a parallelogram is the length of one side times the perpendicular distance from that side to its opposite 

side. Similarly the volume of a parallelepiped is the area of one side times the perpendicular distance from that side 

to the side opposite. The perpendicular distance is not the distance that the fluid moved (vAt). This perpendicular 
distance is smaller by a factor cos cr, where a is the angle that the plane is tilted. It is most easily described by the angle 
that the normal to the plane makes with the direction of the fluid velocity. 

AV = Ah = A(vAt)cosa 

The flow rate is then AV / At = Avcosa. Introduce the unit normal vector h, then this expression can be rewritten in 
terms of a dot product, 

Av cos a = Av- h = A ■ v (9.1) 

where a is the angle between the direction of the fluid velocity and the normal to the area. 

This invites the definition of the area itself as a vector, and that’s what I wrote in the final 
expression. The vector A is a notation for Ah, and defines the area vector. If it looks a little 
odd to have an area be a vector, do you recall the geometric interpretation of a cross product? 

That's the vector perpendicular to two given vectors and it has a magnitude equal to the area of 
the parallelogram between the two vectors. It’s the same thing. 

General Flow, Curved Surfaces 

The fluid velocity will not usually be a constant in space. It will be some function of position. The surface doesn't 
have to be flat; it can be cylindrical or spherical or something more complicated. How do you handle this? That's why 
integrals were invented. 

The idea behind an integral is that you will divide a complicated thing into small pieces and add the results of the 
small pieces to estimate the whole. Improve the estimation by making more, smaller pieces, and in the limit as the size 
of the pieces goes to zero get an exact answer. That’s the procedure to use here. 

The concept of the surface integral is that you systematically divide a surface into a number ( N ) of pieces 
(k = 1, 2, . . .N). The pieces have area A A^ and each piece has a unit normal vector h^. Within the middle of each 
of these areas the fluid has a velocity v^. This may not be a constant, but as usual with integrals, you pick a point 
somewhere in the little area and pick the v there; in the limit as all the pieces of area shrink to zero it won’t matter 
exactly where you picked it. The flow rate through one of these pieces is Eq. (9.1), -h^AA^, and the corresponding 

estimate of the total flow through the surface is, using the notation A A^ = h^AA^, 

N 

Vk ■ 

k = l 
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This limit as the size of each piece is shrunk to zero and correspondingly the number of pieces goes to infinity is the 
definition of the integral 

r N 

/ v ■ dA = lim 'S^vu-AAfc (9.2) 

j 

Example of Flow Calculation 

In the rectangular pipe above, suppose that the flow exhibits shear, rising from zero at the bottom to vq at the top. The 
velocity field is 

v(x,y,z) =v x (y)x = v 0 tX (9.3) 

The origin is at the bottom of the pipe and the ^-coordinate is measured upward from the origin. What is the flow rate 
through the area indicated, tilted at an angle (j) from the vertical? The distance in and out of the plane of the picture 
(the z-axis) is the length a. Can such a fluid flow really happen? Yes, real fluids such as water have viscosity, and if you 
construct a very wide pipe but not too high, and leave the top open to the wind, the horizontal wind will drag the fluid 
at the top with it (even if not as fast). The fluid at the bottom is kept at rest by the friction with the bottom surface. 
In between you get a gradual transition in the flow that is represented by Eq. (9.3). 


0 



Now to implement the calculation of the flow rate: 

Divide the area into N pieces of length A£ k along the slant. 

The length in and out is a so the piece of area is A A k = aA£ k . 

The unit normal is n k = rccos <fi — ysincj). (It happens to be independent of the index k, but that’s special to this 
example.) 

The velocity vector at the position of this area is v = vq xy k /b. 

Put these together and you have the piece of the flow rate contributed by this area. 

Aflow fc = v ■ A A k = Vq^x ■ ciA£ k (x cos(f> — t/sin0) 
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In the last line I put all the variables in terms of i, using y = i cos q b. 
Now sum over all these areas and take a limit. 


N 


lim 

A4-m 


E 

k= 1 


ducosS . „ , 

Vq i a Aik cos 0 


r b/ cos (j) n n n2 

dd Vo T d cos 2 0 = Vo~, COS 2 0 

o o 2 


6/ cos <j> 
0 


a 




cos 2 (j) = Oo 


ab 

y 


This turns out to be independent of the orientation of the plane; the parameter (j) is absent from the result. If you think 
of two planes, at angles </>i and 02, what flows into one flows out of the other. Nothing is lost in between. 


Another Flow Calculation 

Take the same sort of fluid flow in a pipe, but make it a little more complicated. Instead of a flat surface, make it a 
cylinder. The axis of the cylinder is in and out in the picture and its radius is half the width of the pipe. Describe the 
coordinates on the surface by the angle 6 as measured from the midline. That means that — 7T / 2 < 6 < n/2. Divide 
the surface into pieces that are rectangular strips of length a (in and out in the picture) and width bAd^/2. (The radius 
of the cylinder is b/ 2.) 

b 

AAj c = a-A0i i , and = xcosO^ + ysind^ (9.4) 



The velocity field is the same as before, v(x,y,z ) = Vq xy/b, so the contribution to the flow rate through this piece of 
the surface is 

Vk ■ AAi . = vo^-x ■ a - Adonic 


b b 

Uk = g + 2 sm ^’ 


The value of at the angle 9^ is 


so 


y = \ \ l + sin0 fe ] 
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Put the pieces together and you have 

1 6 1 6 

Vo 2 i 1 + sin Ok] X ■ a 2 A 0k [% cos Ok + y sin 9 k \ = v 0 - [l + sin 9 k ] a-A9 k cos 9 k 

The total flow is the sum of these over k and then the limit as A 9 k — > 0. 


lim 

A9,,^0 


1 b r/ 2 1 b 

Y"' Vq- [l + sm9 k ]a-A9 k cos9 k = / Vq- [l + sin#] a- d9 cos 9 
,2 2 J-n/2 2 2 


Finally you can do the two terms of the integral: Look at the second term first. You can of course start grinding away 
and find the right trigonometric formula to do the integral, OR, you can sketch a graph of the integrand, sin 9 cos 9, on 
the interval — 7r/2 <9 < 7t/2 and write the answer down by inspection. The first part of the integral is 


ab f 71 / 2 n ab n 
Vq — cos v — Vo — sin v 
4 J-n/2 4 


n/2 

—n/2 


ab 

= V °Y 


And this is the same result as for the flat surface calculation. I set it up so that the two results are the same; it's easier to 
check that way. Gauss’s theorem of vector calculus will guarantee that you get the same result for any surface spanning 
this pipe and for this particular velocity function. 


9.2 Vector Derivatives 

I want to show the underlying ideas of the vector derivatives, divergence and curl, and as the names themselves come 
from the study of fluid flow, that's where I'll start. You can describe the flow of a fluid, either gas or liquid or anything 
else, by specifying its velocity field, v(x,y,z) = v(f). 



For a single real-valued function of a real variable, it is often too complex to capture all the properties of a function 
at one glance, so it’s going to be even harder here. One of the uses of ordinary calculus is to provide information about 
the local properties of a function without attacking the whole function at once. That is what derivatives do. If you 
know that the derivative of a function is positive at a point then you know that it is increasing there. This is such an 
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ordinary use of calculus that you hardly give it a second thought (until you hit some advanced calculus and discover that 
some continuous functions don't even have derivatives). The geometric concept of derivative is the slope of the curve 
at a point — the tangent of the angle between the x-axis and the straight line that best approximates the curve at that 
point. Going from this geometric idea to calculating the derivative takes some effort. 

How can you do this for fluid flow? If you inject a small amount of dye into the fluid at some point it will spread 
into a volume that depends on how much you inject. As time goes on this region will move and distort and possibly 
become very complicated, too complicated to grasp in one picture. 



There must be a way to get a simpler picture. There is. Do it in the same spirit that you introduce the derivative, 
and concentrate on a little piece of the picture. Inject just a little bit of dye and wait only a little time. To make it 
explicit, assume that the initial volume of dye forms a sphere of (small) volume V and let the fluid move for a little time. 

1. In a small time At the center of the sphere will move. 

2. The sphere can expand or contract, changing its volume. 

3. The sphere can rotate. 

4. The sphere can distort. 


Div, Curl, Strain 

The first one, the motion of the center, tells you about the velocity at the center of the sphere. It's like knowing the 
value of a function at a point, and that tells you nothing about the behavior of the function in the neighborhood of the 
point. 


The second one, the volume, gives new information. You can simply take the time derivative dV / dt to see if 
the fluid is expanding or contracting; just check the sign and determine if it's positive or negative. But how big is it? 
That's not yet in a useful form because the size of this derivative will depend on how much the original volume is. If 
you put in twice as much dye, each part of the volume will change and there will be twice as much rate of change in the 
total volume. Divide the time derivative by the volume itself and this effect will cancel. Finally, to get the effect at one 
point take the limit as the volume approaches a point. This defines a kind of derivative of the velocity field called the 
divergence. 


1 dV 

Iim — — — 

v^o V dt 


divergence of v 


(9.5) 
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This doesn’t tell you how to compute it any more than saying that the derivative is the slope tells you how to compute 
an ordinary* derivative. I'll have to work that out. 

But first look at the third way that the sphere can move: rotation. Again, if you take a large object it will distort 
a lot and it will be hard to define a single rotation for it. Take a very small sphere instead. The time derivative of 
this rotation is its angular velocity, the vector u. In the limit as the sphere approaches a point, this tells me about the 
rotation of the fluid in the immediate neighborhood of that point. If I place a tiny paddlewheel in the fluid, how will it 
rotate? 


2c o = curl of v 


(9.6) 


The factor of 2 is for later convenience. 

After considering expansion and rotation, the final way that the sphere can change is that it can alter its shape. In 
a very small time interval, the sphere can slightly distort into an ellipsoid. This will lead to the mathematical concept of 
the strain. This is important in the subject of elasticity and viscosity, but I'll put it aside for now save for one observation: 
how much information is needed to describe whatever it is? The sphere changes to an ellipsoid, and the first question 
is: what is the longest axis and how much stretch occurs along it — that's the three components of a vector. After that 
what is the shortest axis and how much contraction occurs along it? That’s one more vector, but you need only two 
new components to define its direction because it's perpendicular to the long axis. After this there's nothing left. The 
direction of the third axis is determined and so is its length if you assume that the total volume hasn’t changed. You 
can assume that is so because the question of volume change is already handled by the divergence; you don’t need it 
here too. The total number of components needed for this object is 2 + 3 = 5. It comes under the heading of tensors. 

9.3 Computing the divergence 

Now how do you calculate these? I'll start with the simplest, the divergence, and compute the time derivative of a 
volume from the velocity field. To do this, go back to the definition of derivative: 


dV , V(t + At) - V(t) 

—rr = 6m — 

dt At— At 


(9.7) 


* Can you start from the definition of the derivative as a slope, use it directly with no limits, and compute the 
derivative of x 2 with respect to x, getting 2x? It can be done. 
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Pick an arbitrary surface to start with and see how the volume changes as the fluid moves, carrying the surface 
with it. In time At a point on the surface will move by a distance vAt and it will carry with it a piece of neighboring 
area AA. This area sweeps out a volume. This piece of volume is not A A times vAt because the motion isn’t likely 
to be perpendicular to the surface. It's only the component of the velocity normal to the surface that contributes to the 
volume swept out. Use n to denote the unit vector perpendicular to A A, then this volume is AAn-v At. This is the 
same as the calculation for fluid flow except that I’m interpreting the picture differently. 

If at a particular point on the surface the normal n is more or less in the direction of the velocity then this dot 
product is positive and the change in volume is positive. If it's opposite the velocity then the change is negative. The 
total change in volume of the whole initial volume is the sum over the entire surface of all these changes. Divide the 
surface into a lot of pieces A A^ with accompanying unit normals hi, then 


A Kota l = 

l 

Not really. I have to take a limit before this becomes an equality. The limit of this as all the A A{ — > 0 defines an integral 

A Kota i = / dAh ■ v At 

and this integral notation is special; the circle through the integral designates an integral over the whole closed surface 
and the direction of h is always taken to be outward. Finally, divide by At and take the limit as At approaches zero. 


dV_ 

dt 


dAh ■ v 


( 9 . 8 ) 


The v-hdA is the rate at which the area dA sweeps out volume as it is carried with the fluid. Note: There’s nothing in 
this calculation saying that I have to take the limit as V — > 0; it’s a perfectly general expression for the rate of change 
of volume in a surface being carried with the fluid. It's also a completely general expression for the rate of flow of fluid 
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through a fixed surface as the fluid moves past it. I'm interested in the first interpretation for now, but the second is 
just as valid in other contexts. 

Again, use the standard notation in which the area vector combines the unit normal and the area: dA = ndA. 

divergence of v = lim = lim J) v ■ dA (9.9) 

V->o V dt y^o V J 

If the fluid is on average moving away from a point then the divergence there is positive. It’s diverging. 

The Divergence as Derivatives 

This is still a long way from something that you can easily compute. I'll first go through a detailed analysis of how you 
turn this into a simple result, and I'll then go back to try to capture the essence of the derivation so you can see how 
it applies in a wide variety of coordinate systems. At that point I’ll also show how to get to the result with a lot less 
algebra. You will see that a lot of the terms that appear in this first calculation will vanish in the end. It's important 
then to go back and see what was really essential to the calculation and what was not. As you go through this derivation 
then, try to anticipate which terms are going to be important and which terms are going to disappear. 

Express the velocity in rectangular components, v x x + v y y + v z z . For the small volume, choose a rectangular 
box with sides parallel to the axes. One corner is at point (xo, yo, Zq) and the opposite corner has coordinates that differ 
from these by (Ax, Ay, Az). Expand everything in a power series about the first corner as in section 2.5. Instead of 
writing out (xo,yo,Zo) every time, I'll abbreviate it by (o). 


Az 


(x 0 , y 0 , z 0 ) Ax 


dv 

x v x (x, y, z ) = u x (o) + (x - Xoj-sf (o) + (y ~ Vo) 


PI 


y 


+ § (I - Io)2 e^ 


dx y dy 

(o) + (x - x 0 )(y - y 0 ) 


S% (o) + (,- *)^<„) 


1 . ^d 2 v x . , , ,, . d 2 v x 


(9.10) 


dxdy 


(o) + 


There are six integrals to do, one for each face of the box, and there are three functions, v x , v y , and v z to expand in 
three variables x, y, and z. Don't Panic. A lot of these are zero. If you look at the face on the right in the sketch you 
see that it is parallel to the y-z plane and has normal fi — x. When you evaluate v-n, only the v x term survives; flow 
parallel to the surface (v y , v z ) contributes nothing to volume change along this part of the surface. Already then, many 
terms have simply gone away. 
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Write the two integrals over the two surfaces parallel to the y-z plane, one at Xq and one at Xq + Ax. 


v- dA 


I right 


/ v-dA 

J left 

ryo+Ay 


■z 0 +Az 


dy 


dzv x (x o + A x,y,z) ~ 


ryo+Ay 


dy 


rZo+Az 


dz v x (x 0 ,y,z ) 


' 2/0 


'z 0 


'Vo 


'Zo 


The minus sign comes from the dot product because n points left on the left side. You can evaluate these integrals 
by using their power series representations, and though you may have an infinite number of terms to integrate, at least 
they’re all easy. Take the first of them: 


ryo+Ay 

[■Zo+Az 

/ dy 

/ dz 

ho 

’Zq 


W:(o) + 


d Ur 


(A x )^( 0 ) + (y-y 0 ) 


dVx (o) + (z- *)£(„) + J(Ax) 


dy 


2 I> + 


dv 7 


= v x ( 0 )AyAz + (Ax)-^( 0 )AyAz+ 


1,. dv x 1 2 A .P v x , S , 1 / a „\2 9 2 V x 


^Ayf^ M + - { ^fAy- dz 


(o) + 2 (Aa:) 2 dx2 


(o)AyAz + 


Now look at the second integral, the one that you have to subtract from this one. Before plunging in to the calculation, 
stop and look around. What will cancel; what will contribute; what will not contribute? The only difference is that this 
is now evaluated at Xq instead of at Xq + Ax. The terms that have Ax in them simply won’t appear this time. All the 
rest are exactly the same as before. That means that all the terms in the above expression that do not have a Ax in 
them will be canceled when you subtract the second integral. All the terms that do have a Ax will be untouched. The 
combination of the two integrals is then 

(Ax)^A( 0 )A V A z + \{Axf d ^(„)AyAz + l { Ax)AA[ a )(Ayf A z + ■■■ 

Two down four to go, but not really. The other integrals are the same except that x becomes y and y becomes 
z and z becomes x. The integral over the two faces with y constant are then 

(/l) 1 (l^V 

( A v)~df^ AzAx + ^ Ay ^~d]/^ AzAx + " ■ 
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and a similar expression for the final two faces. The definition of Eq. (9.9) says to add all three of these expressions, 
divide by the volume, and take the limit as the volume goes to zero. V = AxAyAz, and you see that this is a common 
factor in all of the terms above. Cancel what you can and you have 


uu x ( \ UL ’y ( \ 

dx 0 dy 0 dx 




In the limit that the all the Ax, A y, and A z shrink to zero the terms with a second derivative vanish, as do all the other 
higher order terms. You are left then with a rather simple expression for the divergence. 


divergence of v 


divx = 


dv x duy dv^ 

dx dy dx 


(9.11) 


This is abbreviated by using the differential operator V, “del." 


„ d „ d „ d 
V = x-pr— +y^~ + z 7=r- 
dx dy dz 


(9.12) 


Then you can write the preceding equation as 


divergence of v = divx = V ■ v 


(9.13) 


The symbol V will take other forms in other coordinate systems. 

Now that you've waded through this rather technical set of manipulations, is there an easier way? Yes but, without 
having gone through the preceding algebra you won't be able to see and to understand which terms are important and 
which terms are going to cancel or otherwise disappear. When you need to apply these ideas to something besides 
rectangular coordinates you have to know what to keep and what to ignore. Once you know this, you can go straight 
to the end of the calculation and write down those terms that you know are going to survive, dropping the others. This 
takes practice. 

Simplifying the derivation 

In the long derivation of the divergence, the essence is that you find v-n on one side of the box (maybe take it in the 
center of the face), and multiply it by the area of that side. Do this on the other side, remembering that n isn’t in the 
same direction there, and combine the results. Do this for each side and divide by the volume of the box. 


[v x {xo + Ax, y 0 + Ay/2,z 0 + Az/2)AyAz 

-v x {xo, yo + At//2, z 0 + Az/2)AyAz\ -f- (AxAyAz) 


(9.14) 
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the Ay and Az factors cancel, and what's left is, in the limit Ax —> 0, the derivative dv x /dx. 

I was careful to evaluate the values of v x in the center of the sides, but you see that it didn't matter. In the limit 
as all the sides go to zero I could just as easily taken the coordinates at one corner and simplified the steps still more. 
Do this for the other sides, add, and you get the result. It all looks very simple when you do it this way, but what if you 
need to do it in cylindrical coordinates? 

rA(f) 

A z 

When everything is small, the volume is close to a rectangular box, so its volume is V = (Ar)(Ac)(rA0). Go 
through the simple version for the calculation of the surface integral. The top and bottom present nothing significantly 
different from the rectangular case. 



[v z (r 0 , 0o, zq + A z) - v z (r 0 , 0 O , z 0 )] (Ar)(r o A0) 4 - r 0 Ar 0 A<j)Az 


dvz 

dz 


The curved faces of constant r are a bit different, because the areas of the two opposing faces aren’t the same. 


[v r {r 0 + A r, 00, ^o)(fo + Ar)A<fiAz 


ty(r 0 , 0o, zo}roA<j)Az] - 4 - r 0 Ar A0Ac 


1 d(rv r ) 
r dr 


A bit more complex than the rectangular case, but not too bad. 

Now for the constant 0 sides. Here the areas of the two faces are the same, so even though they are not precisely 
parallel to each other this doesn’t cause any difficulties. 


[v^(r o ,0o + A0, Zq) 


v z (r 0 , 0o, *o)] (Ar)(Az) 4- r 0 ArA<j)Az 


i dvcj, 
r <90 


The sum of all these terms is the divergence expressed in cylindrical coordinates. 


div v = 


1 <9 (rv r ) 1 9'Ofp dv z 
r dr r d(f) dz 


(9.15) 


The corresponding expression in spherical coordinates is found in exactly the same way, problem 9.4. 

1 d(r 2 v r ) 1 d{smQvo) 1 dv^ 

r 2 dr r sin 0 dQ rsin# <90 


div v 


(9.16) 
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These are the three commonly occurring coordinates system, though the same simplified method will work in any 
other orthogonal coordinate system. The coordinate system is orthogonal if the surfaces made by setting the value of 
the respective coordinates to a constant intersect at right angles. In the spherical example this means that a surface of 
constant r is a sphere. A surface of constant 9 is a half-plane starting from the z-axis. These intersect perpendicular 
to each other. If you set the third coordinate, 0, to a constant you have a cone that intersects the other two at right 
angles. Look back to section 8.8. 

9.4 Integral Representation of Curl 

The calculation of the divergence was facilitated by the fact that the equation (9.5) could be manipulated into the form 
of an integral, Eq. (9.9). Is there a similar expression for the curl? Yes. 


curl v = lim — ® dA x v 
V-s-o V J 


(9.17) 


For the divergence there was a logical and orderly development to derive Eq. (9.9) from (9.5). Is there a similar intuitively 
clear path here? I don't know of one. The best that I can do is to show that it gives the right answer. 

And what's that surface integral doing with a x instead of a ■? No mistake. Just replace the dot product by a 
cross product in the definition of the integral. This time however you have to watch the order of the factors. 


ndA 



To verify that this does give the correct answer, use a vector field that represents pure rigid body rotation. You're 
going to take the limit as AV — > 0, so it may as well be uniform. The velocity field for this is the same as from 
problem 7.5. 

v = uj x r (9.18) 


To evaluate the integral use a sphere of radius R centered at the origin, making n = f. You also need the identity 

A x (B x C) = B(A-C) - C(A-B). 


dA x (uj x r) = uj(dA ■ r) — r(uj ■ dA ) 


(9.19) 
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Choose a spherical coordinate system with the z-axis along u. 

dA = ndA = fdA, and uj ■ dA = u dA cos 9 


/ 


dA x v 


® uR dA — rui dA cos 6 




r2n 


coR AttR 2 — uj R 2 sin 6 d6 dcfjzR cos 0 cos 6 


1 0 


/ 0 


coR 47t R 2 — coz 27 tR 3 / sin Odd cos 2 0 

Jo 

ujR An tR 2 — ujz 2nR 3 / cos 2 6 d cos 6 = c JR Ait R 2 — 


i - 1 


cu-nR 3 
o 


coz 2nR 3 ■ 


2 

3 


Divide by the volume of the sphere and you have 2a; as promised. In the first term on the first line of the calculation, 
ujR is a constant over the surface so you can pull it out of the integral. In the second term, r has components in the x, 
y, and z directions; the first two of these integrate to zero because for every vector with a positive x-component there 
is one that has a negative component. Same for y. All that is left of r is zRcosO. 


The Curl in Components 

With the integral representation, Eq. (9.17), available for the curl, the process is much like that for computing the 
divergence. Start with rectangular of course. Use the same equation, Eq. (9.10) and the same picture that accompanied 
that equation. With the experience gained from computing the divergence however, you don't have to go through all 
the complications of the first calculation. Use the simpler form that followed. 

In Eq. (9.14) you have v ■ A A = v x AyAz on the right face and on the left face. This time replace the dot with 
a cross (in the right order). 

On the right, 

A A XV — AyAzx x v(x 0 + Ax, y 0 + Ay/2, z 0 + Az/2) (9.20) 

On the left it is 

A A x v = AyAzx x v(xq, yo + Ay/2, zq + Az/2) (9.21) 
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When you subtract the second from the first and divide by the volume, AxAyAz, what is left is (in the limit Ax 0) 
a derivative. 

« w v(x 0 + Ax, t/o, Zo) - v(x 0 , yo, z Q ) ^ A S dv 
x x y x x ^ 

Ax ox 


/dv y „ dv z 

~ ' dx V dx 

Similar calculations for the other four faces of the box give results that you can get simply by changing the labels: 
x — )■ y — )■ z — ^ x, a cyclic permutation of the indices. The result can be expressed most succinctly in terms of V. 



dv x J)v y /dv z \ 

+z ~dZ) 


CUl'l v = V x v 


(9.22) 


In the process of this calculation the normal vector x was parallel on the opposite faces of the box (except for a 
reversal of direction). Watch out in other coordinate systems and you'll see that this isn’t always true. Just draw the 
picture in cylindrical coordinates and this will be clear. 

9.5 The Gradient 

The gradient is the closest thing to an ordinary derivative here, taking a scalar-valued function into a vector field. The 
simplest geometric definition is “the derivative of a function with respect to distance along the direction in which the 
function changes most rapidly,” and the direction of the gradient vector is along that most-rapidly-changing direction. If 
you're dealing with one dimension, ordinary real-valued functions of real variables, the gradient is the ordinary derivative. 
Section 8.5 has some discussion and examples of this, including its use in various coordinate systems. It is most 
conveniently expressed in terms of V. 

grad / = V/ (9.23) 

The equations (8.15), (8.27), and (8.28) show the gradient (and correspondingly V) in three coordinate systems. 


rectangular: 

cylindrical: 

spherical: 


_ „ d 

V = x^- 

ox 

_ „ d 

V = r^r- - 

or 


„ d „ 3 

v m, + z di 

-I d „ d 

r d(j> ^ Z dz 


^ „ 3 ;1 9 

V=r fr +e rW + 


1 


d 


r sin 6 d(J) 


(9.24) 
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In all nine of these components, the denominator ( e.g . rsinOdp) is the element of displacement along the direction 
indicated. 


9.6 Shorter Cut for div and curl 

There is another way to compute the divergence and curl in cylindrical and rectangular coordinates. A direct application 
of Eqs. (9.13), (9.22), and (9.24) gets the result quickly. The major caution is that you have to be careful that the unit 
vectors are inside the derivative, so you have to differentiate them too. 

V-fTis the divergence of v , and in cylindrical coordinates 


^ A 1 ^ (9 — 

V ' l7= + -(^V + ^ + ^v) 


' dz 


(9.25) 


The unit vectors f, 0, and z don’t change as you alter r or z. They do change as you alter 0. (except for z). 

dr do dz dr do dz dz 

dr dr dr dz dz dz <90 

Next come dr /dp and dp /dp. This is problem 8.20. You can do this by first showing that 


(9.26) 


r = x cosp + y sinp and 0 = — £sin0 + y cos< 

and differentiating with respect to 0. This gives 

<9f/<90 = 0, and dp/d<p = —f 

Put these together and you have 

„ _ dv r ? 1 d ? N dv z 

V ' v = 7^ +< ^00 (riV + < H) + 7^ 


dr 
dv r 
dr 

dv r 1 


dz 

r 1 ( dr 7<9-U0\ dv z 
+ 4, 'r{ Vr d^ + ,t, 0^) + 0Z 


, 1 dv <P . dv z 

“o 1 — v r H H — o — 

dr r r d<p dz 


(9.27) 


(9.28) 


(9.29) 
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This agrees with equation (9.15). 

Similarly you can use the results of problem 8.15 to find the derivatives of the corresponding vectors in spherical 
coordinates. The non-zero values are 


dr - . dO - dcp 

— = ©suit 1 — = ©COStt -j-r = 

dcp dcp dcp 

dr ? dd 

d9 =U d6 = ~ r 


-f sin 9 — 9 cos 9 


The result is for spherical coordinates 

_ _ 1 d(r 2 v r ) 

V ■ V = — ~ 77 h 

r z or 

The expressions for the curl are, cylindrical: 

dv, 


1 <9(sin 9vq) 1 dv^ 
r sin 9 89 r sin 9 d(j) 


and spherical: 


_ „ / 1 8v z 

V x v = r - ~Fr~r 
\r 8<p 

V x v = f 


dz 


+ ' 


dv r 

dz 


8 v s 

dr 


. 1 djrv^) _ l dv r 

r dr r dcp 


r sinf^ 


9(sin 9v ( p) dvf) 

89 dcp 


2 ( 1 di y _ 1 d{rv^) \ d(rvg) _ dv r \ 

\rsiny dcp r dr ) r \ dr 89 ) 


(9.30) 

(9.31) 

(9.32) 


(9.33) 


9.7 Identities for Vector Operators 

Some of the common identities can be proved simply by computing them in rectangular components. These are vectors, 
and if you show that one vector equals another vector it doesn't matter that you used a simple coordinate system to 
demonstrate the fact. Of course there are some people who quite properly complain about the inelegance of such a 
procedure. They’re called mathematicians. 

V-Vxff=0 V x V/ = 0 V x V x v = V(V -v) — (V ■ V)v (9.34) 

There are many other identities, but these are the big three. 


/ 


v ■ dA = 


/ 


d 3 r V ■ v 



(9.35) 


are the two fundamental integral relationships, going under the names of Gauss and Stokes. See chapter 13 for the 
proofs of these integral relations. 
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9.8 Applications to Gravity 

The basic equations to describe the gravitational field in Newton’s theory are 


V ■ g = -4vr Gp, 


and 


Vxj = 0 


(9.36) 


In these equations, the vector field g is defined by placing a (very small) test mass m at a point and measuring the 
gravitational force on it. This force is proportional to m itself, and the proportionality factor is called the gravitational 
field g. The other symbol used here is p, and that is the volume mass density, dm/dV of the matter that is generating 
the gravitational field. G is Newton’s gravitational constant: G = 6.67 x 10 -11 N-m 2 /kg 2 . 



For the first example of solutions to these equations, take the case of a spherically symmetric mass that is the 
source of a gravitational field. Assume also that its density is constant inside; the total mass is M and it occupies 
a sphere of radius R. Whatever g is, it has only a radial component, g = g r f. Proof: Suppose it has a sideways 
component at some point. Rotate the whole system by 180° about an axis that passes through this point and through 
the center of the sphere. The system doesn’t change because of this, but the sideways component of g would reverse. 
That can't happen. 

The component g r can’t depend on either 6 or 0 because the source doesn't change if you rotate it about any 
axis; it’s spherically symmetric. 

9 = 9r(r)r (9.37) 

Now compute the divergence and the curl of this field. Use Eqs. (9.16) and (9.33) to get 


^ 1 d{r 2 g r ) 

v ' 9r(r)r = 7*S- 


and 


V x g r {r)f = 0 


The first equation says that the divergence of g is proportional to p. 

1 d(r 2 g r ) 


dr 


= —AnGp 


(9.38) 
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Outside the surface r = R, the mass density is zero, so this is 


1 d(r 2 g r ) 
r 2 dr 


= 0 , 


implying r 2 g r = C , 


where C is some as yet undetermined constant. Now do this inside. 


and 


c 


3r ~ r 2 


^ = — 47rGpo, where p 0 = SM/AttR 3 

This is 

' /!/ )// // ' 8 = -4vr Gp 0 r 2 , so r 2 p r = -^vrGp 0 r 3 + C' , 

4 C' 

or g r (r) = --nGp 0 r + ^ 

There are two constants that you have to evaluate: C and C' . The latter has to be zero, because C' /r 2 — » oo as r — > 0, 
and there’s nothing in the mass distribution that will cause this. As for the other, note that g r must be continuous at 
the surface of the mass. If it isn’t, then when you try to differentiate it in Eq. (9.38) you’ll be differentiating a step 
function and you get an infinite derivative there (and the mass density isn’t infinite there). 

g r {R~) = -^TtGpo R = g r {R+) = ^2 

Solve for C and you have 

C = — -7rGp 0 R 3 = -^ttG^^R 3 = -GM 
3 3 AnK 6 

Put this all together and express the density po ' n terms of M and R to get 



9r ( r ) 


- GM/r 2 (r > R) 
- GMr/R 3 (r < R) 


(9.39) 


This says that outside the spherical mass distribution you can’t tell what its radius R is. It creates the same 
gravitational field as a point mass. Inside the uniform sphere, the field drops to zero linearly toward the center. For a 
slight variation on how to do this calculation see problem 9.14. 
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Non-uniform density 

The density of the Earth is not uniform; it's bigger in the center. The gravitational field even increases as you go down 
below the Earth's surface. What does this tell you about the density function? V ■ g = —4n Gp remains true, and I'll 
continue to make the approximation of spherical symmetry, so this is 


1 d(r 2 g r ) 
r 2 dr 


dg r 

dr 


2 

H 9r 

r 


—47 tGp{r) 


(9.40) 


That gravity increases with depth (for a little while) says 


= —47 xGp{r) — -g r > 0 
dr r 


Why 0? Remember: Qr is itself negstive snd T is messured outw3rd. Sort out the signs, 
to get 


p{r) < - 


1 

2nGr Pr 


At the surface, g r {R ) = —GM/R 2 , so this is 


can solve for the density 


( m M _ 2 3M _ 2 
P{K) < M3 - 3 ' 47tFP ~ 3 Paverage 

The mean density of the Earth is 5.5gram/cm 3 , so this bound is 3.7gram/cm 3 . Pick up a random rock. What is its 
density? 


9.9 Gravitational Potential 

The gravitational potential is that function V for which 


g = -VV 


(9.41) 


That such a function even exists is not instantly obvious, but it is a consequence of the second of the two defining 
equations (9.36). If you grant that, then you can get an immediate equation for V by substituting it into the first of 
(9.36). 

V ■ <7 = —V ■ W = —4nGp, or V 2 V = 4tt G p (9.42) 

This is a scalar equation instead of a vector equation, so it will often be easier to handle. Apply it to the same example 
as above, the uniform spherical mass. 
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The Laplacian, V 2 is the divergence of the gradient, so to express it in spherical coordinates, combine Eqs. (9.24) 
and (9.31). 


1 d 


V 2 V = r 2 ^- + 


r 2 dr 


,dV 


dr 


1 


d 


r 2 sin 6 dd 


ndV\ 

Sin 0-FT7T + 


d 2 V 


dd J r 2 sin 2 8 dtp 2 


(9.43) 


Because the mass is spherical it doesn’t change no matter how you rotate it so the same thing holds for the 
solution, V(r). Use this spherical coordinate representation of V 2 and for this case the 9 and p derivatives vanish. 


1 d 
r 2 dr 


,dV 

dr 


= 47 tGppr) 


(9.44) 


I changed from d to d because there's now one independent variable, not several. Just as with Eq. (9.38) I’ll divide this 
into two cases, inside and outside. 


Outside: 


iTfVTUo 

r 2 dr V dr J 


SO 



c 


Continue solving this and you have 


dV C! C 

-^ = -2 ^V{r) = -- + D ( r>R ) (9.45) 

, . . 1 d ( 2 dV\ A „ 2 dV t „ r 3 

Inside: [r -j^- J = AnGpo so r~ — = 4nGpo— + C 

Continue, dividing by r 2 and integrating, 

V{r) = 47rGp 0 j-y + D' (r < R) (9.46) 

There are now four arbitrary constants to examine. Start with C' . It’s the coefficient of l/r in the domain where r < R. 
That means that it blows up as r — > 0, but there's nothing at the origin to cause this. C' = 0. Notice that the same 
argument does not eliminate C because (9.45) applies only for r > R. 

Boundary Conditions 

Now for the boundary conditions at r = R. There are a couple of ways to determine this. I find the simplest and the 
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most general approach is to recognize that the equations (9.42) and (9.44) must be satisfied everywhere. That means 
not just outside, not just inside, but at the surface too. The consequence of this statement is the result* 

V is continuous at r = R dV/dr is continuous at r = R (9.47) 


Where do these continuity conditions come from? Assume for a moment that the first one is false, that V is discontinuous 
at r = R, and look at the proposition graphically. If V changes value in a very small interval the graphs of V , of dV/dr, 
and of d 2 V / dr 2 look like 



The second derivative on the left side of Eq. (9.44) has a double spike that does not appear on the right side. It 
can't be there, so my assumption that V is discontinuous is false and V must be continuous. 

Assume next that V is continuous but its derivative is not. The graphs of V, of dV/dr , and of d?V / dr 2 then 
look like 



* Watch out for the similar looking equations that appear in electrostatics. Only the first of these equations holds 
there; the second must be modified by a dielectric constant. 
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The second derivative on the left side of Eq. (9.44) still has a spike in it and there is no such spike in the p on 
the right side. This is impossible, so dV j dr too must be continuous. 

Back to the Problem 

Of the four constants that appear in Eqs. (9.45) and (9.46), one is already known, C' . For the rest, 

/? 2 G 

V(R-) = V(R+) is 4 ttGpo^ + D l = ~ + D 

dV dV . . „ R C 

lR {R ~ ] = ~dR {R+) ' S 87tG P°J = + b2 

These two equations determine two of the constants. 

m R2 732 

C = 4nGpo — , then D — D' = 47rGpo F 47rGpo — = 2nGpoR 2 

3 6 3 

Put this together and you have 



\ _ / l^Gpor 2 - 2-kGpqR 2 + D (r < R) 
[r) ~ \ -±TTGp 0 R 3 /r + D ( r>R ) 


(9.48) 


Did I say that the use of potentials is supposed to simplify the problems? Yes, but only the harder problems. The 
negative gradient of Eq. (9.48) should be g. Is it? The constant D can't be determined and is arbitrary. You may choose 
it to be zero. 

Magnetic Boundary Conditions 

The equations for (time independent) magnetic fields are 

VxB = goJ and V ■ Z? = 0 (9.49) 

The vector J is the current density, the current per area, defined so that across a tiny area dA the current that flows 
through the area is dl = J ■ dA. (This is precisely parallel to Eq. (9.1) for fluid flow rate.) In a wire of radius R, carrying 
a uniform current I, the magnitude of J is I/ttR 2 . These equations are sort of the reverse of Eq. (9.36). 
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If there is a discontinuity in the current density at a surface such as the edge of a wire, will there be some sort of 
corresponding discontinuity in the magnetic field? Use the same type of analysis as followed Eq. (9.47) for the possible 
discontinuities in the potential function. Take the surface of discontinuity of the current density to be the x-y plane, 
z = 0 and write the divergence equation 


dB x OB, ()B Z 

a 7 + nt + -dr 



If there is a discontinuity, it will be in the z variable. Perhaps B x or B z is discontinuous at the x-y plane. The divergence 
equation has a derivative with respect to 2 only on B z . If one of the other components changes abruptly at the surface, 
this equation causes no problem — nothing special happens in the x or y direction. If B z changes at the surface then 
the derivative dB z /dz has a spike. Nothing else in the equation has a spike, so there's no way that you can satisfy the 
equation. Conclusion: The normal component of B is continuous at the surface. 

What does the curl say? 



Derivatives with respect to x or y don't introduce a problem at the surface, all the action is again along 2. Only the 
terms with a d/dz will raise a question. If B x is discontinuous at the surface, then its derivative with respect to 2 will 
have a spike in the y direction that has no other term to balance it. (J y has a step here but not a spike.) Similarly for 
By. This can’t happen, so the conclusion: The tangential component of B is continuous at the surface. 

What if the surface isn’t a plane. Maybe it is a cylinder or a sphere. In a small enough region, both of these look 
like planes. That's why there is still* a Flat Earth Society. 

9.10 Index Notation 

In section 7.11 I introduced the summation convention for repeated indices. I'm now going to go over it again and 
emphasize its utility in practical calculations. 

When you want to work in a rectangular coordinate system, with basis vectors x, y, and z, it is convenient to use 
a more orderly notation for the basis vectors instead of just a sequence of letters of the alphabet. Instead, call them e\, 
e 2 , and e 3 . (More indices if you have more dimensions.) I’ll keep the assumption that these are orthogonal unit vectors 
so that 

e 1 -e 2 = 0, e 3 ■ e 3 = 1 , etc. 


* en.wikipedia.org/wiki/Flat_Earth_Society 


9 — Vector Calculus 1 


305 


More generally, write this in the compact notation 


e,: - 


e j 



1 (i=j) 
0 (i^j) 


(9.50) 


The Kronecker delta is either one or zero depending on whether i = j or i yf j, and this equation sums up the properties 
of the basis in a single, compact notation. You can now write a vector in this basis as 


A = A 1 ei + A 2 e 2 + A 3 e 3 = A { e* 


The last expression uses the summation convention that a repeated index is summed over its range. When an index is 
repeated in a term, you will invariably have exactly two instances of the index; if you have three it's a mistake. 

When you add or subtract vectors, the index notation is 

A + B = C = Ai + Bi ej = (A{ + Bj)e j = C\ e* or A^ + = Cj 

Does it make sense to write A + B = A^ii + Bj,. e^? Yes, but it’s sort of pointless and confusing. You can change any 
summed index to any label you find convenient — they’re just dummy variables. 

Aiii = A £ e £ = A m e m = A 1 e 1 + A 2 e 2 + A 3 e 3 

You can sometime use this freedom to help do manipulations, but in the example A^ei + B^e^ it’s no help at all. 
Combinations such as 


Ei + Fi or E k F k Gi = Hi or 

are valid. The last is simply Eq. (7.8) for a matrix times a column matrix. 

A{ + Bj = C k or E m F m G m or 


MkeDi = F k 


C k 


have no meaning. 

You can manipulate the indices for your convenience as long as you follow the rules. 

A-i = BijCj is the same as A k = B kn C n or A £ = B £p C p 
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The scalar product has a simple form in index notation: 

A ■ B = Ai&i ■ Bj e.j = AjBj e,- L ■ e.j = AiBjfiij = AiB \ (9.51) 

The final equation comes by doing one of the two sums (say j ), and only the term with j = i survives, producing the 
final expression. The result shows that the sum over the repeated index is the scalar product in disguise. 

Just as the dot product of the basis vectors is the delta-symbol, the cross product provides another important 
index function, the alternating symbol. 

&i ■ 6j x e k Cjjfc (9.52) 

e 2 xe 3 = yxz = e i, so 6123 = e 3 ■ e 2 x e 3 = 1 

C x e 3 = — e 3 x 62, so 6132 = —6123 = —1 

e 3 ■ ei x e 2 = e 3 ■ e 3 = 1, so e 3 i2 = 6123 = 1 

ei -e 3 x e 3 = ei -0 = 0 so ei 33 = 0 

If the indices are a cyclic permutation of 123, (231 or 312), the alternating symbol is 1. 

If the indices are an odd permutation of 123, (132 or 321 or 213), the symbol is —1. 

If any two of the indices are equal the alternating symbol is zero, and that finishes all the cases. The last property is easy 

to see because if you interchange any two indices the sign changes. If the indices are the same, the sign can’t change so 

it must be zero. 

Use the alternating symbol to write the cross product itself. 

A x B = Ai§i x Bjij and the /c-component is 

§k ■ A x B = ■ AiBj e j x ij = e^jAjBj 

You can use the summation convention to advantage in calculus too. The V vector operator has components 

Vj or some people prefer d j 

For unity of notation, use X\ = x and x 2 = y and x 3 = z. In this language, 

d _ d 
dx — dx\ 


di = Vi 


IS 


(9.53) 
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Note: This notation applies to rectangular component calculations only! (e-i ■ Cj = 5{j.) The generalization to curved 
coordinate systems will wait until chapter 12. 


div v = V ■ v = diVi 


dvi dv2 dv3 

1 1 - 

dx\ dx2 dxs 


You should verify that <9j Xj = 5ij. 

Similarly the curl is expressed using the alternating symbol. 


curl v = Vxi? becomes ( -ijkPj v k = ( curl v ) . 


the f th components of the curl. 

An example of manipulating these object: Prove that curl grad 0 = 0. 

curl grad 0 = V x V</> — > e^djd^ 


(9.54) 


(9.55) 


(9.56) 


You can interchange the order of the differentiation as always, and the trick here is to relabel the indices — that is a 
standard technique in this business. 

^ijk^kdjtp £ikjdjdk<fi 

The first equation interchanges the order of differentiation. In the second equation, I call “j" "k" and I call “k" 
“j" . These are dummy indices, summed over, so this can't affect the result, but now this expression looks like that of 
Eq. (9.56) except that two (dummy) indices are reversed. The e symbol is antisymmetric under interchange of any of 
its indices, so the final expression is the negative of the expression in Eq. (9.56). The only number equal to minus itself 
is zero, so the identity is proved. 

9.11 More Complicated Potentials 

The gravitational field from a point mass is g 
satisfies 

9 = -V0 

For several point masses, the gravitational field is the vector sum of the contributions from each mass. In the same way 
the gravitational potential is the (scalar) sum of the potentials contributed by each mass. This is almost always easier 


= — Gm fjr 2 , so the potential for this point mass is (j) = —Gm/r. This 

„—Gm „ d Gm Gmr 
= — V = f = 

ryt ry* ry* 2 


9 — Vector Calculus 1 


308 


to calculate than the vector sum. If the distribution is continuous, you have an integral. 

f 


x - Grrii. f G dm 

0total = > Or - / 

^ r k J r 


This sort of very abbreviated notation for the sums and integrals is normal once you have done a lot of them, but when 
you're just getting started it is useful to go back and forth between this terse notation and a more verbose form. Expand 
the notation and you have 

0totai(r ) = -G f (9.57) 

J I i y> ly* / I 

This is still not very explicit, so expand it some more. Let 



f‘ ' = xx' + yy' + z z' and 

then cf>(x,y,z) = -G [ dx'dy'dz' p{x' ,y' , z') 


r = xx + yy + zz 

1 

yj(x — x ') 2 + (y — y ') 2 + (z — z ') 2 


where p is the volume mass density so that dm = pdV = pd 3 r', and the limits of integration are such that this extends 
over the whole volume of the mass that is the source of the potential. The primed coordinates represent the positions 
of the masses, and the non-primed ones are the position of the point where you are evaluating the potential, the field 
point. The combination d 3 r is a common notation for a volume element in three dimensions. 

For a simple example, what is the gravitational potential from a uniform thin rod? Place its center at the origin 
and its length = 2 L along the c:-axis. The potential is 



Gdm 

r 



Xdz' 

yjx 2 + y 2 + (z — z ' ) 2 


where A = M/2 L is its linear mass density. This is an elementary integral. Let u = z l — z, and a = \J x 2 + y 2 . 


rL—Z 


= -GX 


du 


-L-z Va 2 + u 2 


= -GX / dB = -G\e 


u=L—z 


u=—L—z 
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where u = a sinh 0. Put this back into the original variables and you have 


= -G\ 


sinh 


-i 


L — z 

V x2 + y 2 


+ sinh 


-l 


V x2 + V 2 


(9.58) 


The inverse hyperbolic function is a logarithm as in Eq. (1.4), so this can be rearranged and the terms combined into 
the logarithm of a function of x, y, and z, but the sinh _1 s are easier to work with so there’s not much point. This is 
not too complicated a result, and it is far easier to handle than the vector field you get if you take its gradient. It's still 
necessary to analyze it in order to understand it and to check for errors. See problem 9.48. 


Exercises 

1 Prove that the geometric interpretation of a cross product is as an area. 

2 Start from a picture of C = A — B and use the definition and properties of the dot product to derive the law of 
cosines. (If this takes you more than about three lines, start over, and no components, just vectors.) 

3 Start from a picture of C = A — B and use the definition and properties of the cross product to derive the law of 

sines. (If this takes you more than a few lines, start over.) 

4 Show that A-BxC = AxB-C. Do this by drawing pictures of the three vectors and find the geometric meaning 
of each side of the equation, showing that they are the same (including sign). 

5 (a) If the dot product of a given vector F with every vector results in zero, show that F = 0. (b) Same for the cross 

product. 

6 From the definition of the dot product, and in two dimensions, draw pictures to interpret A-(B + C) and from there 
prove the distributive law: A • = A • B + A • C . (Draw B + C tip-to-tail.) 

7 For a sphere, from the definition of the integral, what is j> dAl What is f dA? 

8 What is the divergence of xxy + yyz + zzxl 

9 What is the divergence of fr sin# + $ ro sin # cos </> + (f>r cos0? (spherical) 
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10 What is the divergence of fr sin 0 + <f)Z sincj) + zzrl (cylindrical) 

11 In cylindrical coordinates draw a picture of the vector field v = (/>r 2 (2 + cos0) (for z = 0). Compute the divergence 
of v and indicate in a second sketch what it looks like. Draw field lines for v. 

12 What is the curl of the vector field in the preceding exercise (and indicate in a sketch what it’s like). 

13 Calculate V -f. (a) in cylindrical and (b) in spherical coordinates. 

14 Compute d^Xj. 

15 Compute div curl v using index notation. 

16 Show that = \(i — j)(j — k)(k — i). 

17 Use index notation to derive V m (fv) = (V/) ■ v + /V ■ v. 

18 Write Eq. (8.6) in index notation, where (x,y) — > Xj and (x',y') — > x’-. 
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Problems 

9.1 Use the same geometry as that following Eq. (9.3), and take the velocity function to be V = xv^xy/b 2 . Take 
the bottom edge of the plane to be at (x,y) = (0,0) and calculate the flow rate. Should the result be independent 
of the angle 0? Sketch the flow to understand this point. Does the result check for any special, simple value of 0? 
Ans: (vodb tan 0) / 3 

9.2 Repeat the preceding problem using the cylindrical surface of Eq. (9.4), but place the bottom point of the cylinder 
at coordinate {x,y) = 0CO)O). Ans: (t; 0 a/4)(2a;o + vr6/4) 

9.3 Use the same velocity function v = xvoxy/b 2 and evaluate the flow integral outward from the closed surface of the 
rectangular box, (c < x < d), (0 < y < b), (0 < z < a). The convention is that the unit normal vector points outward 
from the six faces of the box. Ans: vq a(d — c)/ 2 

9.4 Work out the details, deriving the divergence of a vector field in spherical coordinates, Eq. (9.16). 

9.5 (a) For the vector field v = Ar, that is pointing away from the origin with a magnitude proportional to the distance 
from the origin, express this in rectangular components and compute its divergence. 

(b) Repeat this in cylindrical coordinates (still pointing away from the origin though). 

(c) Repeat this in spherical coordinates, Eq. (9.16). 

9.6 Gauss's law for electromagnetism says § E ■ dA = q e nc i/eo- If the electric field is given to be E = Ar, what is 
the surface integral of E over the whole closed surface of the cube that spans the region from the origin to (x,y,z) = 
(a, a, a)? 

(a) What is the charge enclosed in the cube? 

(b) Compute the volume integral, f d 3 r V ■ E inside the same cube? 

9.7 Evaluate the surface integral, j> v ■ dA, of v = f Ar 2 sin 2 9 + 9 Br cos 9 sin 0 over the surface of the sphere centered 
at the origin and of radius R. Recall section 8.8. 

9.8 (a) What is the area of the spherical cap on the surface of a sphere of radius R: 0 < 9 < 9q1 

(b) Does the result have the correct behavior for both small and large 9q1 

(c) What are the surface integrals over this cap of the vector field v = fvo cos 9 sin 2 0? Consider both f v ■ dA and 
f i lx dA. Ans: vqtiR 2 (1 — cos 2 9q)/2 
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9.9 A rectangular area is specified parallel to the x-y plane at z = d and 0 < x < a, a < y < b. A vector field is 
v = (xAxyz + y Byx 2 + zCx 2 yz 2 ) Evaluate the two integrals over this surface 

J v-dA, and J dA x v 


9.10 For the vector field v = Ar n f, compute the integral over the surface of a sphere of radius R centered at the origin: 

§ v- dA. (n > 0) 

Compute the integral over the volume of this same sphere J d 3 rV ■ v. 

9.11 The velocity of a point in a rotating rigid body is v = u X f . See problem 7.5. Compute its divergence and curl. 
Do this in rectangular, cylindrical, and spherical coordinates. 

9.12 Fill in the missing steps in the calculation of Eq. (9.29). 

9.13 M imic the calculation in section 9.6 for the divergence in cylindrical coordinates, computing the curl in cylindrical 
coordinates, V x v. Ans: Eq. (9.32). 

9.14 Another way to get to Eq. (9.39) is to work with Eq. (9.38) directly and to write the function p(r) explicitly as 
two cases: r < R and r > R. Multiply Eq. (9.38) by r 2 and integrate it from zero to r, being careful to handle the 
integral differently when the upper limit is < R and when it is > R. 

r 2 g r (r) = —AxxG f dr' r' 2 p(r') 

Jo 

Note: This is not simply reproducing that calculation that I’ve already done. This is doing it a different way. 

9.15 If you have a very large (assume it's infinite) slab of mass of thickness d the gravitational field will be perpendicular 
to its plane. To be specific, say that there is a uniform mass density po between z = ±d/2 and that g = g z (z)z. Use 
Eqs. (9.36) to find g z {z). 

Be precise in your reasoning when you evaluate any constants. (What happens when you rotate the system about the 
x-axis?) Does your graph of the result make sense? 

Ans: in part, g z = +27rGpod, (z < —d/2) 

9.16 Use Eqs. (9.36) to find the gravitational field of a very long solid cylinder of uniform mass density po ar| d radius 
R. (Assume it's infinitely long.) Start from the assumption that in cylindrical coordinates the field is g = g r (r,(j), z) f, 
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and apply both equations. 

Ans: in part g r = — 2nGpor , (0 < r < R) 

9.17 The gravitational field in a spherical region r < R is stated to be g(r ) = —fC/r, where C is a constant. What 
mass density does this imply? 

If there is no mass for r > R, what is g there? 

9.18 In Eq. (8.23) you have an approximate expression for the gravitational field of Earth, including the effect of the 
equatorial bulge. Does it satisfy Eqs. (9.36)? (r > i?Earth) 

9.19 Compute the divergence of the velocity function in problem 9.3 and integrate this divergence over the volume of 
the box specified there. Ans: (d — c)av o 

9.20 The gravitational potential, equation (9.42), for the case that the mass density is zero says to set the Laplacian 
Eq. (9.43) equal to zero. Assume a solution to V 2 V = 0 to be a function of the spherical coordinates r and 6 alone 
and that 

V(r,9) = Ar~^ +V> f(x), where x = cos 6 

Show that this works provided that / satisfies a certain differential equation and show that it is the Legendre equation 
of Eq. (4.18) and section 4.11. 

9.21 The volume energy density, dU / dV in the electric field is Co E 2 /2. The electrostatic field equations are the same 
as the gravitational field equations, Eq. (9.36). 


V-E = p/e o, and VxE = 0 

A uniformly charged ball of radius R has charge density po for r < R, Q = AnpoR 5 /3. 

(a) What is the electric field everywhere due to this charge distribution? 

(b) The total energy of this electric field is the integral over all space of the energy density. What is it? 

(c) If you want to account for the mass of the electron by saying that all this energy that you just computed is the electron's 
mass via Eq = me 2 , then what must the electron's radius be? What is its numerical value? Ans: r e = 3 /s (e 2 /47reomc 2 ) 
= 1.69 fm 

9.22 The equations relating a magnetic field, B, to the current producing it are, for the stationary case, 

V x B = po J and V-B = 0 
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Here J is the current density, current per area, defined so that across a tiny area dA the current that flows through the 
area is dl = J -dA. A cylindrical wire of radius R carries a total current / distributed uniformly across the cross section 
of the wire. Put the z-axis of a cylindrical coordinate system along the central axis of the wire with positive z in the 
direction of the current flow. Write the function J explicitly in these coordinates (for all values of r < R, r > R). Use 
the curl and divergence expressed in cylindrical coordinates and assume a solution in the form B = </> B ^(r , (j) , z) . Write 

out the divergence and curl equations and show that you can satisfy these equations relating J and B with such a form, 
solving for B^. Sketch a graph of the result. At a certain point in the calculation you will have to match the boundary 

conditions at r = R. Recall that the tangential component of B (here B is continuous at the boundary. 

Ans: in part, p, 0 Ir/2nR 2 (r < R) 

9.23 A long cylinder of radius R has a uniform charge density inside it, po and it is rotating about its long axis with 
angular speed u. This provides an azimuthal current density J = p^roocf) in cylindrical coordinates. Assume the form of 
the magnetic field that this creates has only a ^-component: B = B z (^r, 0, z)z and apply the equations of the preceding 
problem to determine this field both inside and outside. The continuity condition at r = R is that the tangential 
component of B (here it is B z ) is continuous there. The divergence and the curl equations will (almost) determine the 
rest. Ans: in part, —pr 2 ui / 2 + C (r < R) 

9.24 By analogy to Eqs. (9.9) and (9.17) the expression 


lim — (b (bdA 
V^o V J 

is the gradient of the scalar function (j). Compute this in rectangular coordinates by mimicking the derivation that led to 
Eq. (9.11) or (9.15), showing that it has the correct components. 

9.25 (a) A fluid of possibly non-uniform mass density is in equilibrium in a possibly non-uniform gravitational field. Pick 
a volume and write down the total force vector on the fluid in that volume; the things acting on it are gravity and the 
surrounding fluid. Take the limit as the volume shrinks to zero, and use the result of the preceding problem in order to 
get the equation for equilibrium. 

(b) Now apply the result to the special case of a uniform gravitational field and a constant mass density to find the 
pressure variation with height. Starting from an atmospheric pressure of 1.01 x 10 5 N/m 2 , how far must you go under 
water to reach double this pressure? 

Ans: Vp = — pg\ about 10 meters 
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9.26 The volume energy density, u = dU/dV, in the gravitational field is g 2 /8TiG. [Check the units to see if it makes 
sense.] Use the results found in Eq. (9.39) for the gravitational field of a spherical mass and get the energy density. An 
extension of Newton’s theory of gravity is that the source of gravity is energy not just mass! This energy that you just 
computed from the gravitational field is then the source of more gravity, and this energy density contributes as a mass 
density p = u/c 2 would. 

(a) Find the additional gravitational field g T {r) that this provides and add it to the previous result for gy(r). 

(b) For our sun, its mass is 2 x 10 3() kg and its radius is 700,000 km. Assume its density is constant throughout so that 
you can apply the results of this problem. At the sun's surface, what is the ratio of this correction to the original value? 

(c) What radius would the sun have to be so that this correction is equal to the original g r {R), resulting in double 
gravity? Ans. (a) (^correction /^original — GM^jXORc 

9.27 Continuing the ideas of the preceding problem, the energy density, u = dU / dV , in the gravitational field is 
g 2 /8nG, and the source of gravity is energy not just mass. In the region of space that is empty of matter, show that 
the divergence equation for gravity, (9.36), then becomes 

V ■ g = - 4nGu/c 2 = -g 2 /2c 2 

Assume that you have a spherically symmetric system, g = g r (r)f, and write the differential equation for g r . 

(a) Solve it and apply the boundary condition that as r — >• oo, the gravitational field should go to g r {r ) — > —GM/r 2 . 

How does this solution behave as r — > 0 and compare its behavior to that of the usual gravitational field of a point mass. 

(b) Can you explain why the behavior is different? Note that in this problem it is the gravitational field itself that is the 
source of the gravitational field; mass as such isn't present. 

(c) A characteristic length appears in this calculation. Evaluate it for the sun. It is 1/4 the Schwarzchild radius that 
appears in the theory of general relativity. 

Ans: (a) g r = —GM/[r(r + i?)] , where R = GM /2c 2 

9.28 In the preceding problem, what is the total energy in the gravitational field, f udVl How does this (4-c 2 ) compare 
to the mass M that you used in setting the value of g r as r — > oo? 

9.29 Verify that the solution Eq. (9.48) does satisfy the continuity conditions on V and V' . 

9.30 The r-derivatives in Eq. (9.43), spherical coordinates, can be written in a different and more convenient form. 
Show that they are equivalent to 

1 d 2 {rV) 
r dr 2 
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9.31 The gravitational potential from a point mass M is —GM/r where r is the distance to the mass. Place a single 
point mass at coordinates ( x,y,z ) = (0,0, d) and write its potential V. Write this expression in terms of spherical 
coordinates about the origin, (r,9), and then expand it for the case r > d in a power series in d/r, putting together the 
like powers of d/r. Do this through order (d/r) 3 . Express the result in the language of Eq. (4.61). 

Ans: -GU-GMd[cos6] - GM^[§ cos 2 0 - ±] - [§ cos 3 0 - f cos 0] 

9.32 As in the preceding problem a point mass M has potential —GM/r where r is the distance to the mass. The 
mass is at coordinates (x,y,z) = (0, 0, d). Write its potential V in terms of spherical coordinates about the origin, 
( r,9 ), but this time take r < d and expand it in a power series in r/d. Do this through order ( r/d ) 3 . 

Ans: (~GM/d)[ 1 + (r/d)Pi(cos0) + (r 2 /d 2 )P 2 ( cosO) + (r 3 /d 3 )P 3 (cos6) + ■■■]" 

9.33 Theorem: Given that a vector field satisfies V x v = 0 everywhere, then it follows that you can write v as the 
gradient of a scalar function, v = — Vip. For each of the following vector fields find if possible, and probably by trail and 
error if so, a function tp that does this. First determine is the curl is zero, because if it isn't then your hunt for a tp will 
be futile. You should try however — the hunt will be instructive. 

(a) xy 3 + 3yxy 2 , (c) xycos(xy) + yxcos(xy), 

(b) xx 2 y + yxy 2 , (d) xy 2 sinh(2xy 2 ) + 2y xy sinh(2xy 2 ) 

9.34 A hollow sphere has inner radius a , outer radius b, and mass M, with uniform mass density in this region. 

(a) Find (and sketch) its gravitational field g r (r ) everywhere. 

(b) What happens in the limit that a — > b? In this limiting case, graph g r . Use g r (r ) = —dV/dr and compute and 
graph the potential function V (r) for this limiting case. This violates Eq. (9.47). Why? 

(c) Compute the area mass density, a = dM / dA, in this limiting case and find the relationship between the discontinuity 
in dV/dr and the value of a. 

9.35 Evaluate 

X j i d{X{, ^ijk^-ijki ^ij W 

and show that ^ijk^mnk = ~ $in$jm 

9.36 Verify the identities for arbitrary A, 

( A-V)r = A or AidiXj = Aj 

V ■ V x v = 0 or di€ij]~djVk = 0 

V-(fA) = (Vf)-A + f(V-A) or d i (fA i ) = (d i f)A i + fd i A i 
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You can try proving all these in the standard vector notation, but use the index notation instead. It's a lot easier. 

9.37 Use index notation to prove V x V x v = V(V-v) — V 2 v. First, what identity you have to prove about e’s. 

9.38 Is V x v perpendicular to v ? Either prove it's true or give an explicit example for which it is false. 

9.39 If for arbitrary and arbitrary B j it is known that a^A^B j = 0, prove then that all the a^j are zero. 

9.40 Compute the divergences of 

Axx + By 2 y + C z in rectangular coordinates. 

Arf + B9 2 9 + C (j> in spherical coordinates. 

How do the pictures of these vector fields correspond to the results of these calculations? 

9.41 Compute the divergence and the curl of 

yx — xy A f yx — xy 

x 2 + y 2 ' and ° f 0 x 2 + y 2 ) 2 

9.42 Translate the preceding vector fields into polar coordinates, then take their divergence and curl. And of course 
draw some pictures. 

9.43 As a review of ordinary vector algebra, and perhaps some practice in using index notation, translate the triple scalar 
product into index notation and prove first that it is invariant under cyclic permutations of the vectors. 

(a) A-BxC = B- CxA = C-AxB. Then that 

(b ) A-B xC = Ax B-C. 

(c) What is the result of interchanging any pair of the vectors in the product? 

(d) Show why the geometric interpretation of this product is as the volume of a parallelepiped. 

9.44 What is the total flux, <f E ■ dA, out of the cube of side a with one corner at the origin? 

(a) E = ax + /3y + 7 z 

(b) E = axx + /3yy + 7 zz. 

9.45 The electric potential from a single point charge q is kq/r. Two charges are on the z-axis: — q at position z = Zo 
and +q at position Zq + a. 

(a) Write the total potential at the point (r,9,(j)) in spherical coordinates. 

(b) Assume that r a and r zq, and use the binomial expansion to find the series expansion for the total potential 
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out to terms of order l/r 3 . 

(c) how does the coefficient of the l/r 2 term depend on 2o? The coefficient of the l/r 3 term? These tell you the total 
electric dipole moment and the total quadrupole moment. 

(d) What is the curl of the gradient of each of these two terms? 

The polynomials of section 4.11 will appear here, with argument cos#. 

9.46 For two point charges q\ and ([ 2 , the electric field very far away will look like that of a single point 

charge qi + q 2 . Go the next step beyond this and show that the electric field at large distances will 
approach a direction such that it points along a line that passes through the “center of charge" (like the 
center of mass): + q 2 ^ 2 )/(qi + 52 )- What happens to this calculation if <72 = — qi? You will find the 

results of problem 9.31 useful. Sketch various cases of course. At a certain point in the calculation, you 
will probably want to pick a particular coordinate system and to place the charges conveniently, probably 
one at the origin and the other on the 2 -axis. You should keep terms in the expansion for the potential 
up through l/r 2 and then take — VV. Unless of course you find a better way. 

9.47 Fill in the missing steps in deriving Eq. (9.58). 

9.48 Analyze the behavior of Eq. (9.58). The first thing you will have to do is to derive the behavior of sinh -1 in various 
domains and maybe to do some power series expansions. In every case seek an explanation of why the result comes out 
as it does. 

(a) If 2 = 0 and r = \J x 1 + y 2 3> L what is it and what should it be? (And no, zero won't do.) 

(b) If 2 = 0 and r <C L what is it and what should it be? 

(c) If z > L and r — > 0 what is this and what should it be? Be careful with your square roots here. 

(d) What is the result of (c) for z ^$> L and for 2 - L < L? 

9.49 Use the magnetic field equations as in problem 9.22, and assume a current density that is purely azimuthal and 
dependent on r alone. That is, in cylindrical coordinates it is 

J = Jo4>f{r ) 

Look for a solution for the magnetic field in the form B = zB z {r). What is the total current per length in this case? 
That is, for a length A 2 how much current is going around the axis and how is this related to B z along r = 0? 
Examine also the special case for which /(r) = 0 except in a narrow range a < r < b with b — a <C & (thin). Compare 
this result to what you find in introductory texts about the solenoid. 
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9.50 For a spherical mass distribution such as the Earth, what would the mass density function have to be so that the 
gravitational field has constant strength as a function of depth? Ans: p oc l/r. 

9.51 Use index notation to derive 

(Ax B)-(C xD) = (A-C)(B-D) - (A-D)(B-C) 


9.52 Show that V -{A x B) = B ■ V x A — A-V x B. Use index notation to derive this. 

9.53 Use index notation to compute Ve ik-r . Also compute the Laplacian of the same exponential, V 2 = div grad. 

9.54 Derive the force of one charged ring on another, as shown in equation (2.35). 

9.55 A point mass m is placed at a distance d > R from the center of a spherical shell of radius R and mass M . 
Starting from Newton’s gravitational law for point masses, derive the force on m from M . Place m on the z-axis and 
use spherical coordinates to express the piece of dM within dd and d(j). (This problem slowed Newton down when he 
first tried to solve it, as he had to stop and invent integral calculus first.) 

9.56 The volume charge density is measured near the surface of a (not quite perfect) conductor to be p(x ) = poe x / a 
for x < 0. The conductor occupies the region x < 0, so the electric field in the conductor is zero once you’re past the 
thin surface charge density. Find the electric field everywhere and graph it. Assume that everything is independent of y 
and z (and t). 


Partial Differential Equations 


If the subject of ordinary differential equations is large, this is enormous. I am going to examine only one corner of it, 
and will develop only one tool to handle it: Separation of Variables. Another major tool is the method of characteristics 
and I'll not go beyond mentioning the word. When I develop a technique to handle the heat equation or the potential 
equation, don’t think that it stops there. The same set of tools will work on the Schroedinger equation in quantum 
mechanics and on the wave equation in its many incarnations. 

10.1 The Heat Equation 

The flow of heat in one dimension is described by the heat conduction equation 


„ A dT 
P = —kA^— 
ox 


( 10 . 1 ) 


where P is the power in the form of heat energy flowing toward positive x through a wall and A is the area of the 
wall, k is the wall's thermal conductivity. Put this equation into words and it says that if a thin slab of material has a 
temperature on one side different from that on the other, then heat energy will flow through the slab. If the temperature 
difference is big or the wall is thin ( dT/dx is big) then there’s a big flow. The minus sign says that the energy flows 
from hot toward cold. 

When more heat comes into a region than leaves it, the temperature there will rise. This is described by the 
specific heat, c. 

dQ dT 


dQ = me dT, 


or 


dt mC dt 


( 10 . 2 ) 


Again in words, the temperature rise in a chunk of material is proportional to the amount of heat added to it and inversely 
proportional to its mass. 


A 


P(x,t) 


X 


P(x + Ax, t) 


x + Ax 
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For a slab of area A, thickness Ax, and mass density p, let the coordinates of the two sides be x and x + Ax. 


m = pAAx , 


and 


dQ 

dt 


P{x , t) - P(x + Ax, t) 


The net power into this volume is the power in from one side minus the power out from the other. Put these three 
equations together. 


dQ 

dt 


= me 


dT 

dt 


If you let Ax — > 0 here, all you get is 0 


4 a dT .dT(x,t) A dT(x + Ax,t) 

pAAxc-r- = -kA — + K A — 

dt ox ox 

0, not very helpful. Instead divide by Ax first and then take the limit. 


dT _ kA /<9T(x + Ax, t) dT(x,t)\ 1 
dt + pc A \ dx dx ) Ax 


and in the limit this is 

OT _ k d 2 T 
dt cp dx 2 


(10.3) 


I was a little cavalier with the notation in that I didn’t specify the argument of T on the left side. You could say that 
it was (x + Ax/2, t), but in the limit everything is evaluated at (x,t) anyway. I also assumed that k, the thermal 
conductivity, is independent of x. If not, then it stays inside the derivative, 


dT _ 1 d f dT\ 
dt cp dx \ dx ) 


(10.4) 


In Three Dimensions 

In three dimensions, this becomes 

^ = — V 2 T (10.5) 

dt cp 

Roughly speaking, the temperature in a box can change because of heat flow in any of three directions. More precisely, 
the correct three dimensional equation that replaces Eq. (10.1) is 


H = -kVT 


( 10 . 6 ) 
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where H is the heat flow vector. That is the power per area in the direction of the energy transport. H ■ dA = dP , the 
power going across the area dA. The total heat flowing into a volume is 




H-dA 


(10.7) 


where the minus sign occurs because this is the heat flow in. For a small volume AV, you now have m = pAV and 

dT \ ~\ r dT r n . t * 
mc ~dt = pAV C ~dt = - f H ■ dA 

Divide by AV and take the limit as AV — > 0. The right hand side is the divergence, Eq. (9.9). 

Dm p 

pc^r- = — lim ® H ■ dA = —V ■ H = +V ■ kS7T = +kV 2 T 
at a’F— > o AV J 

The last step again assumes that the thermal conductivity, K, is independent of position. 


10.2 Separation of Variables 

How do you solve these equations? I'll start with the one-dimensional case and use the method of separation of variables. 
The trick is to start by looking for a solution to the equation in the form of a product of a function of x and a function 
of t. T(x,t ) = f{t)g(x). I do not assume that every solution to the equation will look like this — that's just not true. 
What will happen is that I’ll be able to express every solution as a sum of such factored forms. That this is so is a 
theorem that I don’t plan to prove here. For that you should go to a purely mathematical text on PDEs. 

If you want to find out if you have a solution, plug in: 

dT _ ^(FT . d[ _ k d 2 g 

dt cp dx 2 ' S dt P cp dx 2 

Denote the constant by n/cp = D and divide by the product fg. 

1_<H_ n 1 d2 9 
f dt g dx 2 


( 10 . 8 ) 
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The left side of this equation is a function of t alone, no x. The right side is a function of x alone with no t, hence the 
name separation of variables. Because x and t can vary quite independently of each other, the only way that this can 
happen is if the two side are constant (the same constant). 

1 df , 1 d 2 q 

—p —j— = ol and D-- r ^r=a (10.9) 

j clt g ax 2 

At this point, the constant a can be anything, even complex. For a particular specified problem there will be boundary 
conditions placed on the functions, and those will constrain the a’s. If a is real and positive then 

g(x) = A sinh \J a/Dx + B cosh J a/D x and f(t) = e at (10.10) 

For negative real cr, the hyperbolic functions become circular functions. 

g(x) = A sin \J —a/ D x + B cos \J—a/Dx and f(t) = e at (10.11) 

If a = 0 then 

g(x) = Ax + B, and f(t)= constant (10.12) 

For imaginary a the f(t) is oscillating and the g{x) has both exponential and oscillatory behavior in space. This can 
really happen in very ordinary physical situations; see section 10.3. 

This analysis provides a solution to the original equation (10.3) valid for any a. A sum of such solutions for 
different ct’s is also a solution, for example 


T(x, t ) = Aie ait sin \J—a±/Dx + A 2 e a2i sin yJ—a^/D x 

or any other linear combination with various a's 

T(x,t) = fa(t)g a (x ) 

fa's} 

It is the combined product that forms a solution to the original partial differential equation, not the separate factors. 
Determining the details of the sum is a job for Fourier series. 
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Example 

A specific problem: You have a slab of material of thickness L and at a uniform temperature To. Plunge it into ice water 
at temperature T = 0 and find the temperature inside at later times. The boundary condition here is that the surface 
temperature is zero, T(0,£) = T(L,t ) = 0. This constrains the separated solutions, requiring that g(0) = g(L ) = 0. 
For this to happen you can’t use the hyperbolic functions of a; that occur when a > 0, you will need the circular functions 
of x, sines and cosines, implying that a < 0. That is also compatible with your expectation that the temperature should 
approach zero eventually, and that needs a negative exponential in time, Eq. (10.11). 

g(x) = Asinkx + B cos kx, with k 2 = —a/D and f(t) = e~ Dk2t 

g( 0) = 0 implies B = 0. g(L ) = 0 implies sin kL = 0. 

The sine vanishes for the values nix where n is any integer, positive, negative, or zero. This implies kL = mr, or 
k = mr/L. The corresponding values of a are a n = — Dn 2 n 2 / L 2 , and the separated solution is 

sin (: rnrx/L ) e -^ 2 DL/IT (10.13) 

If n = 0 this whole thing vanishes, so it's not much of a solution. (Not so fast there! See problem 10.2.) Notice that 
the sine is an odd function so when n < 0 this expression just reproduces the positive n solution except for an overall 
factor of (—1), and that factor was arbitrary anyway. The negative n solutions are redundant, so ignore them. 

The general solution is a sum of separated solutions, see problem 10.3. 

OO __ 

T(x, t) = Y j a n Sin “ e -nWDtlL* ( 10 . 14 ) 

i 

The problem now is to determine the coefficients a n . This is why Fourier series were invented. (Yes, literally, the problem 
of heat conduction is where Fourier series started.) At time t = 0 you know the temperature distribution is T = To, a 
constant on 0 < x < L. This general sum must equal To at time t = 0. 

OO 

rr / . v — > . T17TX . T x 

T(x, 0) = 2_ a n sin — — (0 < x < L) 


Multiply by sin ( mnx/L ) and integrate over the domain to isolate the single term, n = m. 


. mnx 

ax io sm — - — = a m 


dx sin 2 


mnx 
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This expression for a m vanishes for even m, and when you assemble the whole series for the temperature you have 


T(x,t) = ^T 0 J2 ^ sin ' m ^(r m ^ 2Dt / L ' 2 (10.15) 

m odd 

For small time, this converges, but very slowly. For large time, the convergence is very fast, often needing only one or 
two terms. As the time approaches infinity, the interior temperature approaches the surface temperature of zero. The 
graph shows the temperature profile at a sequence of times. 



The curves show the temperature dropping very quickly for points near the surface (x = 0 or L). It drops more gradually 
near the center but eventually goes to zero. 

You can see that the boundary conditions on the temperature led to these specific boundary conditions on the 
sines and cosines. This is exactly what happened in the general development of Fourier series when the fundamental 
relationship, Eq. (5.15), required certain boundary conditions in order to get the orthogonality of the solutions of the 
harmonic oscillator differential equation. That the function vanishes at the boundaries was one of the possible ways to 
insure orthogonality. 

10.3 Oscillating Temperatures 

Take a very thick slab of material and assume that the temperature on one side of it is oscillating. Let the material 
occupy the space 0 < x < oo and at the coordinate x = 0 the temperature is varying in time as T\ cos ut. Is there 
any real situation in which this happens? Yes, the surface temperature of the Earth varies periodically from summer to 
winter (even in Florida). What happens to the temperature underground? 

The differential equation for the temperature is still Eq. (10.3), and assume that the temperature inside the 
material approaches T = 0 far away from the surface. Separation of variables is the same as before, Eq. (10.9), but this 
time you know the time dependence at the surface. It's typical in cases involving oscillations that it is easier to work with 
complex exponentials than it is to work with sines and cosines. For that reason, specify that the surface temperature 
is T\e~ luJi instead of a cosine, understanding that at the end of the problem you must take the real part of the result 
and throw away the imaginary part. The imaginary part corresponds to solving the problem for a surface temperature 
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of sin tut instead of cosine. It's easier to solve the two problems together then either one separately. (The minus sign in 
the exponent of e~ lu}t is arbitrary; you could use a plus instead.) 

The equation (10.9) says that the time dependence that I expect is 


= ol = — r ( - iue lujt ) = —i 
f dt e -iwt y ’ 


too 


The equation for the x-dependence is then 


n d 2 g 

D dtf =a9 = ~ tU}9 


This is again a simple exponential solution, say e@ x . Substitute and you have 

Df5 2 e^ x = —iue^ x , implying (5 = ±\J —iuj/ D 


(10.16) 


Evaluate this as 


— i = 


(e" i7r / 2 ) 


1/2 


_ e — wr/4 _ 


1 — i 


V2 


Let /So = \fuJj2D, then the solution for the x-dependence is 

g(x) = Ae^-^ + 


(10.17) 


Look at the behavior of these two terms. The first has a factor that goes as e +x and the second goes as e~ x . The 
temperature at large distances is supposed to approach zero, so that says that A = 0. The solutions for the temperature 
is now 

Be~ iu}t e~^~ i)l5oX (10.18) 

The further condition is that at x = 0 the temperature is Tie~ lujt , so that tells you that B = T\. 

T(x, t) = = Tie"^ 0 V ( - Wi+A)a;) (10.19) 

When you remember that I’m solving for the real part of this solution, the final result is 



Tie P° x cos(/3qx — tot) 


( 10 . 20 ) 
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This has the appearance of a temperature wave moving into the material, albeit a very strongly damped one. In 
a half wavelength of this wave, /3qX = ^ r, and at that point the amplitude coming from the exponential factor out in 
front is down by a factor of e -7r = 0.04. That's barely noticeable. This is why wine cellars are cellars. Also, you can see 
that at a distance where (5qX > 7t/2 the temperature change is reversed from the value at the surface. Some distance 
underground, summer and winter are reversed. This same sort of equation comes up with the study of eddy currents in 
electromagnetism, so the same sort of results obtain. 

10.4 Spatial Temperature Distributions 

The governing equation is Eq. (10.5). For an example of a problem that falls under this heading, take a cube that is 
heated on one side and cooled on the other five sides. What is the temperature distribution within the cube? How does 
it vary in time? 

I'll take a simpler version of this problem to start with. First, I'll work in two dimensions instead of three; make it 
a very long rectangular shaped rod, extending in the ^-direction. Second, I'll look for the equilibrium solution, for which 
the time derivative is zero. These restrictions reduce the equation (10.5) to 


V 2 T 


d 2 T 8 2 T 
dx 2 + dy 2 


( 10 . 21 ) 


I’ll specify the temperature T(x,y ) on the surface of the rod to be zero on three faces and To on the fourth. Place the 
coordinates so that the length of the rod is along the z-axis and the origin is in one corner of the rectangle. 


T(0,y) = 0 (0 <y<b), 
T(a,y) = 0 (0 <y<b), 


T(x,0) = 0 (0 < x < a) 

T(x, b) = T 0 (0 < x < a) 



x 


( 10 . 22 ) 


Look at this problem from several different angles, tear it apart, look at a lot of special cases, and see what can 
go wrong. In the process you'll see different techniques and especially a lot of applications of Fourier series. This single 
problem will illustrate many of the methods used to understand boundary value problems. 
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Use the same method used before for heat flow in one dimension: separation of variables. Assume a solution to 
be the product of a function of x and a function of y, then plug into the equation. 


T(x,y) = f(x)g(y), then V 2 T = 


2 rr _ d 2 f{x) f ,^d 2 g(y) 


dx 2 


-9(y) + f(x)- 


dy 2 


= 0 


Just as in Eq. (10.8), when you divide by fg the resulting equation is separated into a term involving x only and one 
involving y only. 

1 d 2 f(x) 1 d 2 g{y) 


+ -- 


= 0 


f dx 2 g dy 2 

Because x and y can be varied independently, these must be constants adding to zero. 


1 d 2 f{x) 
f dx 2 


and 


1 d 2 g{y) 
9 dy 2 


(10.23) 


As before, the separation constant can be any real or complex number until you start applying boundary conditions. You 
recognize that the solutions to these equations can be sines or cosines or exponentials or hyperbolic functions or linear 
functions, depending on what a is. 

The boundary conditions state that the surface temperature is held at zero on the surfaces x = 0 and x = a. 
This suggests looking for solutions that vanish there, and that in turn says you should work with sines of x. In the other 
direction the surface temperature vanishes on only one side so you don't need sines in that case. The a = 0 case gives 
linear functions is x and in y, and the fact that the temperature vanishes on x = 0 and x = a kills these terms. (It does 
doesn't it?) Pick ct to be a negative real number: call it a = —k 2 . 


d 2 f(x ) 
dx 2 



f(x) = A sin kx + B cos kx 


The accompanying equation for g is now 


d 2 g{y) 
dy 2 


+k 2 g 


g(y) = C sinh ky + D cosh ky 


(Or exponentials if you prefer.) The combined, separated solution to V 2 T = 0 is 

(A sin kx + B cos kx)(C sinh ky + D cosh ky) 


(10.24) 
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The general solution will be a sum of these, summed over various values of k. This is where you have to apply the 
boundary conditions to determine the allowed k's. 

left: T(0,y) = 0 = B(C smh. ky + D cosh ky), so B = 0 

(This holds for all y in 0 < y < b, so the second factor can’t vanish unless both C and D vanish. If that is the case 
then everything vanishes.) 

right: T(a, y) = 0 = A sin ka(C sinhky + D cosh ky), so sinA;a = 0 

(The factor with y can’t vanish or everything vanishes. If A = 0 then everything vanishes. All that's left is sin ka = 0.) 

bottom: T(x, 0) = 0 = A sin kx D, so D = 0 
(If A = 0 everything is zero, so it has to be D.) 

You can now write a general solution that satisfies three of the four boundary conditions. Combine the coefficients 
A and C into one, and since it will be different for different values of k, call it 7 n . 

OO 

m/ \ . nnx . , nny . . 

T(x,y ) = 2^ In sni smh (10.25) 

n=i a a 

The mr/a appears because sin ka = 0, and the limits on n omit the negative n because they are redundant. 

Now to find all the unknown constants 7 n , and as before that's where Fourier techniques come in. The fourth 
side, at y = b, has temperature To and that implies 

OO 7 

E . mtx mrb 

7 n sin sinn = 1 0 

a a 

77=1 

On this interval 0 < x < a these sine functions are orthogonal, so you take the scalar product of both side with the sine. 

. . rmtx ^ . mtx . , mtb f a , . rmtx „ 

ax sin > 7 n sin smh = / ax sin 1 0 

a a ct Jo a 

77=1 u 

a . . Truth rj-? a r . 

77 777 smh = To 1 - -1 m 

2 a mit L J 




10 — Partial Differential Equations 


330 


Just the odd m terms are present, m = 2£ + 1, so the result is 


T{x,y) 


4 rp ^ 1 sinh ((2f 7 + 1)7 ry/a) 

7 r ° + 1 sinh ((21? + l)nb/a) 


. (2^ + 1)7 TX 

sm 

a 


(10.26) 


You’re not done. 

Does this make sense? The dimensions are clearly correct, but after that it takes some work. There's really just 
one parameter that you have to play around with, and that's the ratio b/a. If it's either very big or very small you may 
be able to check the result. 


w 


T n 


0 


1 


x 


0 a 0 a 

a » b b^> a 

If a b, it looks almost like a one-dimensional problem. It is a thin slab with temperature To on one side and 
zero on the other. There’s little variation along the x-direction, so the equilibrium equation is 


V 2 T = 0 = 


d 2 T d 2 T _ d 2 T 
dx 2 + dy 2 ~ dy 2 


This simply says that the second derivative with respect to y vanishes, so the answer is the straight line T = A + By, 
and with the condition that you know the temperature at y = 0 and at y = b you easily find 

T{x,y ) ~ T 0 y/b 

Does the exact answer look like this? It doesn’t seem to, but look closer. If b <C a then because 0 < y < b you also 
have y a. The hyperbolic function factors in Eq. (10.26) will have very small arguments, proportional to b/a. Recall 
the power series expansion of the hyperbolic sine: sinh x — x + ■ ■ -. These factors become approximately 


sinh ((2f + 1)7 ry/a) ^ (2i + 1)7 ry/a _ y 
sinh ((2K+ 1)7 xb/a) (2l + l)nb/a b 
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The temperature solution is then 


A °° 

T {x ,y^ T 0 j: 

1=0 

Where did that last equation come from? The coefficient of y/b is just the Fourier series of the constant To in terms of 
sines on 0 < x < a. 

What about the opposite extreme, for which 6 > a? This is the second picture just above. Instead of being short 
and wide it is tall and narrow. For this case, look again at the arguments of the hyperbolic sines. Now irb/a is large and 
you can approximate the hyperbolic functions by going back to their definition. 

g* e~ x 1 

sinh x = - — — « -e x , for x » 1 

The denominators in all the terms of Eq. (10.26) are large, ~ e nb l a (or larger still because of the (2i + 1)). This will 
make all the terms in the series extremely small unless the numerators are correspondingly large. This means that the 
temperature stays near zero unless y is large. That makes sense. It’s only for y near the top end that you are near to 
the wall with temperature To. 

You now have the case for which b S> a and y a. This means that I can use the approximate form of the 
hyperbolic function for large arguments. 

sinh ((21 + 1 )t ry/a) _ e W+V”v/« = (2 * +1)7r(y _ 6)/o 
sinh ((2£ + l)7t6/a) e (2£+l)nb/a 

The temperature distribution is now approximately 

T(x,y) « -To jr -L-e-W+Wh-vV* sin ^ + 1)7TX (10.27) 

7T i - JL CL 

£=0 

As compared to the previous approximation where a>6, you can't as easily tell whether this is plausible or not. You 
can however learn from it. See also problem 10.30. 

At the very top, where y = b this reduces to the constant To that you're supposed to have at that position. Recall 
the Fourier series for a constant on 0 < x < a. 


1 y . ( 2£+ l)nx y 

— TT sm = t ot 

+ 16 a b 
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Move down from y = b by the distance a, so that b — y = a. That's a distance from the top equal to the width 
of the rectangle. It's still rather close to the end, but look at the series for that position. 


x = 0 x = a 

V = b 

y = b — a 


T(x,b — a ) 



£=0 


_J_ -(2<+l)» in (2f + !)ra 

2b + 1 a 


For b = 0, the exponential factor is e -7r = 0.043, and for l = 1 this factor is e~ 3?r = 0.00008. This means that measured 
from the To end, within the very short distance equal to the width, the temperature has dropped 95% of the way down 
to its limiting value of zero. The temperature in the rod is quite uniform until you are very close to the heated end. 


The Heat Flow into the Box 

All the preceding analysis and discussion was intended to make this problem and its solution sound oh-so-plausible. 
There's more, and it isn't pretty. 

The temperature on one of the four sides was given as different from the temperatures on the other three sides. 
What will the heat flow into the region be? That is, what power must you supply to maintain the temperature To on 
the single wall? 

At the beginning of this chapter, Eq. (10.1), you have the equation for the power through an area A, but that 
equation assumed that the temperature gradient dT/dx is the same all over the area A. If it isn’t, you simply turn it 
into a density. 


A P 


—kA A 


dT 
dx ’ 


and then 


A P dP _ dT 
A A dA K dx 


(10.28) 


Equivalently, just use the vector form from Eq. (10.6), H = —k VT. In Eq. (10.22) the temperature is To along y = b, 
and the power density (energy / (time ■ area)) flowing in the +y direction is —ndT/dy, so the power density flowing 
into this area has the reversed sign, 

+K, dT / dy (10.29) 


The total power flow is the integral of this over the area of the top face. 

Let L be the length of this long rectangular rod, its extent in the ^-direction. The element of area along the 
surface at y = b is then dA = Ldx, and the power flow into this face is 


Ldxn 


dT 

dy 


y=b 
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The temperature function is the solution Eq. (10.26), so differentiate that equation with respect to y. 

[(2£ + l)7t/a] cosh ((2£ + l)ny/a) . (2£ + 1)7 rx 


r a a 

/ Ldxn—T 0 

In i r z — ' 

0 e=o 


2^ + 1 sinh ((2£ + l)nb/a) 


sm 


at y = b 


4 LkTo 


pa v 

/ dx : 
0 l=o 


. (2£+1)ttx 
sm 


and this sum does not converge. I’m going to push ahead anyway, temporarily pretending that I didn’t notice this minor 
difficulty with the series. Just go ahead and integrate the series term by term and hope for the best. 


ALkTi 


ALk,T( 


£ 

£=0 

oo 

£ 


vr(2£ + 1) 


[ - cos ((2£ + l)7t) + l] 


7 r 2l + 1 

£=0 


= oo 


This infinite series for the total power entering the top face is infinite. The series doesn’t converge (use the integral 
test). 

This innocuous-seeming problem is suddenly pathological because it would take an infinite power source to maintain 
this temperature difference. Why should that be? Look at the corners. You’re trying to maintain a non-zero temperature 
difference (To — 0) between two walls that are touching. This can't happen, and the equations are telling you so! It means 
that the boundary conditions specified in Eq. (10.22) are impossible to maintain. The temperature on the boundary at 
y = b can't be constant all the way to the edge. It must drop off to zero as it approaches x = 0 and x = a. This makes 
the problem more difficult, but then reality is typically more complicated than our simple, idealized models. 

Does this make the solution Eq. (10.26) valueless? No, it simply means that you can’t push it too hard. This 
solution will be good until you get near the corners, where you can't possibly maintain the constant-temperature boundary 
condition. In other regions it will be a good approximation to the physical problem. 

10.5 Specified Heat Flow 

In the previous examples, I specified the temperature on the boundaries and from that I determined the temperature 
inside. In the particular example, the solution was not physically plausible all the way to the edge, though the mathematics 
were (I hope) enlightening. Instead, I'll reverse the process and try to specify the size of the heat flow, computing the 
resulting temperature from that. This time perhaps the results will be a better reflection of reality. 
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Equation (10.29) tells you the power density at the surface, and I'll examine the case for which this is a constant. 
Call it Fq. (There’s not a conventional symbol, so this will do.) The plus sign occurs because the flow is into the box. 

dT 

= F ° 

The other three walls have the same zero temperature conditions as Eq. (10.22). Which forms of the separated solutions 
must I use now? The same ones as before or different ones? 

Look again at the a = 0 solutions to Eqs. (10.23). That solution is 

(. A + Bx)(C + Dy) 

In order to handle the fact that the temperature is zero at y = 0 and that the derivative with respect to y is given at 

y = b, 

(A + Bx)(C) = 0 and (A + Bx){D) = Fq/k, 

Fn 

implying C = 0 = B, then AD = Fq/k =^> — y (10.30) 

K 

This matches the boundary conditions at both y = 0 and y = b. All that's left is to make everything work at the other 
two faces. 



If I can find a solution that equals —Foy/n on the left and right faces then it will cancel the +Foy/n that 
Eq. (10.30) provides. But I can't disturb the top and bottom boundary conditions. The way to do that is to find 
functions that equal zero at y = 0 and whose derivative equals zero at y = b. This is a familiar sort of condition that 
showed up several times in chapter five on Fourier series. It is equivalent to saying that the top surface is insulated so 
that heat can't flow through it. You then use superposition to combine the solution with uniform heat flow and the 
solution with an insulated boundary. 
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Instead of Eq. (10.24), use the opposite sign for a, so the solutions are of the form 

(. A sin ky + B cos ky) (C sinh kx + D cosh kx) 

I require that this equals zero at y = 0, so that says 

(0 + B)(C sinh kx + D cosh kx) = 0 
so B = 0. Now require that the derivative equals zero at y = b, so 

Ak cos kb = 0, or kb = [n + \) n for n = 0,1,2... 

The value of the temperature is the same on the left that it is on the right, so 

C sinh kO + D cosh kO = C sinh ka + D cosh ka =>■ C = D(1 — cosh ka)/ sinh ka (10.31) 


This is starting to get messy, so it’s time to look around and see if I’ve missed anything that could simplify the 
calculation. There's no guarantee that there is any simpler way, but it is always worth looking. The fact that the system 
is the same on the left as on the right means that the temperature will be symmetric about the central axis of the 
box, about x = a/2. That it is even about this point implies that the hyperbolic functions of x should be even about 
x = a/2. You can do this simply by using a cosh about that point. 


A mvky^D cosh. k{x — §)) 


Put these together and you have a sum 


OO 

y/ a n sin 

n = 0 


(n+h)ny' 


cosh 


( (n + 1)ti(x 

{ 



(10.32) 


Each of these terms satisfies Laplace's equation, satisfies the boundary conditions at y = 0 and y = b, and is even 
about the centerline x = a/2. It is now a problem in Fourier series to match the conditions at x = 0. They’re then 
automatically satisfied at x = a. 


OO 

y/ a n sin 
n= 0 


{n+\)ny 


cosh 


(n + \)-k a\ 
2b ) 


-F 0 


y 

K 


(10.33) 
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The sines are orthogonal by the theorem Eq. (5.15), so you can pick out the component a n by the orthogonality of these 
basis functions. 


u n = sm 


(n + \)Try 


then 


or, 


ti m ) cosh 


{umi left side) 

/( m + 5)7ta\ 


— ( u mi 


2b 


Fb 

ft 


Write this out; do the integrals, add the linear term, and you have 


right side) 

(um,y) 


T{x : y) = ( ^ 

ft ft7t 2 ^ (2n + l) 2 
n = 0 v ’ 


Sill 


(n+ \)Try 


cosh 


(n + \)i t(x - §) 


sech 


(n + ^)na 
2b 


(10.34) 


Now analyze this to see if it makes sense. I'll look at the same cases as the last time: b <C a and a <C b. The 
simpler case, where the box is short and wide, has i)<a. This makes the arguments of the cosh and sech large, with 
an a/b in them. For large argument you can approximate the cosh by 


cosh a; « e®/2, x » 1 


Now examine a typical term in the sum (10.34), and I have to be a little more specific and choose x on the left or right 
of a/2. The reason for that is the preceding equation requires x large and positive. I’ll take x on the right, as it makes 
no difference. The hyperbolic functions in (10.34) are approximately 

exp ((n + \)ti{;x — f )/&) = ^n+i)^ x-a)/2b) 
exp ((n + \)iui/2b) 

As long as x is not near the end, that is, not near x = a, the quantity in the exponential is large and negative for all 
n. The exponential in turn makes this extremely small so that the entire sum becomes negligible. The temperature 
distribution is then the single term 

T(X, y)~F„l 

It's essentially a one dimensional problem, with the heat flow only along the —y direction. 

In the reverse case for which the box is tall and thin, a b, the arguments of the hyperbolic functions are small. 
This invites a power series expansion, but that approach doesn’t work. The analysis of this case is quite tricky, and I 
finally concluded that it's not worth the trouble to write it up. It leads to a rather complicated integral. 
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10.6 Electrostatics 

The equation for the electrostatic potential in a vacuum is exactly the same as Eq. (10.21) for the temperature in static 
equilibrium, V 2 V = 0, with the electric field E = — W. The same equation applies to the gravitational potential, 
Eq. (9.42). 

Perhaps you’ve looked into a microwave oven. You can see inside it, but the microwaves aren’t supposed to get 
out. How can this be? Light is just another form of electromagnetic radiation, so why does one EM wave get through 
while the other one doesn’t? I won't solve the whole electromagnetic radiation problem here, but I’ll look at the static 
analog to get some general idea of what’s happening. 



—L 0 L 2L x 


Arrange a set of conducting strips in the x-y plane and with insulation between them so that they don't quite 
touch each other. Now apply voltage Vo on every other one so that the potentials are alternately zero and Vo- This sets 
the potential in the z = 0 plane to be independent of y and 


£ = 0: V(x,y) = { j^<2L) V(x + 2L,y) = V(x,y), a\\ x, y (10.35) 


What is then the potential above the plane, z > 0? Above the plane V satisfies Laplace's equation, 


V 2 V = 


d 2 V d 2 V d 2 V 


+ 


+ 


dx 2 dy 2 dz 2 


= 0 


(10.36) 


The potential is independent of y in the plane, so it will be independent of y everywhere. Separate variables in the 
remaining coordinates. 

-r r / v r, s , . d 2 f j,d 2 g ld 2 f 1 d 2 g 

V( X , z) = / WsW => ^9 + = 0 => J ^ ^ = 0 

This is separated as a function of x plus a function of y, so the terms are constants. 


±fl = -a 2 -^ = +a 2 

f dx 2 ’ g dz 2 


(10.37) 
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I’ve chosen the separation constant in this form because the boundary condition is periodic in x, and that implies that 
I’ll want oscillating functions there, not exponentials. 

f(x) = e iax and f(x + 2L) = f(x) 

=> e 2Lia = 1, or 2 La = 2nn, n = 0, ±1, ±2, . . . 

The separated solutions are then 

f(x)g(z) = e n7rix / L (Ae n7rz / L + Be~ nnz / L ) (10.38) 

The solution for z > 0 is therefore the sum 

OO 

V{x,z)= e nnixlL (A n e nnz / L + B n e~ n7VZ / L ) (10.39) 

n=— oo 

The coefficients A n and B n are to be determined by Fourier techniques. First however, look at the ^-behavior. As you 
move away from the plane toward positive z, the potential should not increase without bound. Terms such as e KZ l^ J 
however increase with z. This means that the coefficients of the terms that increase exponentially in z cannot be there. 

A n = 0 for n > 0, and B n = 0 for n < 0 

OO —1 

V(x,z) = A) + B 0 + J2 e n7rix / L B n e- niTZ / L + e n ™ x / L A n e n7TZ / L (10.40) 

n = 1 n=— oo 

The combined constant Aq + Bq is really one constant; you can call it Co if you like. Now use the usual Fourier 
techniques given that you know the potential at z = 0. 

OO —1 

V (x, 0) = Co + B n e nirix ! L + J2 A n e nmx / L 

n= 1 n=—oo 

The scalar product of e mnix / L with this equation is 


<e“/ L ,y(x,0)> 


2LCo (m = 0) 
2 LB m (m > 0) 
2 LA m (m < 0) 


(10.41) 
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Now evaluate the integral on the left side. First, m yf 0: 


/ mmx/L y/ n \\ _ [ j -mmx/L / 0 ( L < X < 0) 

\ e ,V[X,0)) - j ^axe j y Q ( 0 < x < L) 


= Vo f L dxe~ mnix / L = V 0 —^~ e - mnix / L L 
o —mm o 


= K) 

—mm L J 


Then evaluate it separately for m = 0, and you have (l, V(x, 0)) = Vo L. 

Now assemble the result. Before plunging in, look at what will happen. 

The m = 0 term sits by itself. 

For the other terms, only odd m have non-zero values. 

1 00 1 

V(x,z) = -Vb + Vb V — U-l)m_i\ e miri x / L e -mn z / L 

2 ^ —2mm L J 

m= 1 

-1 -I 

+V 0 E |^_^m _ -^^mnix/L^+mnz/L 

2 1XI/7TX 

m=— oo 

To put this into a real form that is easier to interpret, change variables, letting m = —n in the second sum 
in the first, finally changing the sum so that it is over just the odd terms. 

1 00 1 

V(x, Z) = =-V Q + V 0 Y \ {-l) n - ll e nmx/L e -nnz/L 

2 ^ —2nm L J 

n = 1 
oo 

±l/ n V - IY-lj n - l] e -nmx/L -mrz/L 

y +2wri LV ; J 

1 00 1 

= jrVo + VoY l"(-l) n - !l sinOnnx/ L)e~ nirz / L 

2 L J —rm 

n= 1 

1 O °° 1 

= 2 Vo + -Vo £ ^ sin ((2 1 + 1 ) Ira /£)e- (2 ' +1| " /1 


(10.42) 


and m = n 


(10.43) 
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Having done all the work to get to the answer, what can I learn from it? 

What does it look like? 

Are there any properties of the solution that are unexpected? 

Should I have anticipated the form of the result? 

Is there an easier way to get to the result? 

To see what it looks like, examine some values of z, the distance above the surface. If z = L, the coefficient for 
successive terms is 

£ = 0: -e~ n = 0.028 £ = 1 : — e~ 3n = 1.7 x 10~ 5 (10.44) 

7T 37T 

The constant term is the average potential, and the £ = 0 term adds only a modest ripple, about 5% of the constant 
average value. If you move up to z = 2 L the first factor is 0.0012 and that's a little more than 0.2% ripple. The sharp 
jumps from +Vo to zero and back disappear rapidly. That the oscillations vanish so quickly with distance is perhaps not 
what you would guess until you have analyzed such a problem. 



-V (x, L/2) 


«-V(x, 0) 

The graph shows the potential function at the surface, z = 0, as it oscillates between Vo and zero. It then shows 
successive graphs of Eq. (10.43) at z = Lj 2, then at z = L, then at z = 1.5 L. The ripple is barely visible at the 
that last distance. The radiation through the screen of a microwave oven is filtered in much the same way because the 
wavelength of the radiation is large compared to the size of the holes in the screen. 

When you write the form of the series for the potential, Eq. (10.40), you can see this coming if you look for it. 
The oscillating terms in x are accompanied by exponential terms in z, and the rapid damping with distance is already 
apparent: e -mr2 /^. You don't have to solve for a single coefficient to see that the oscillations vanish very rapidly with 
distance. 

The original potential on the surface was neither even nor odd, but except for the constant average value, it is an 
odd function. 


z = 0 : V(x,y) 




+I/ 0/2 (0 < x < L) 
-l/ 0 /2 (L < x < 2 L) 


V(x + 2 L,y) = V(x,y) 


(10.45) 


10 — Partial Differential Equations 


341 


Solve the potential problem for the constant Vo/2 and you have a constant. Solve it for the remaining odd function 
on the boundary and you should expect an odd function for V(x,z). If you make these observations before solving the 
problem you can save yourself some algebra, as it will lead you to the form of the solution faster. 

The potential is periodic on the x-y plane, so periodic boundary conditions are the appropriate ones. You can 
express these in more than one way, taking as a basis for the expansion either complex exponentials or sines and cosines. 


or the combination 


e m nx/L, n = Q ; . . . 
cos(mrx/ L), n = 0, 1. . . . 


sm(nnx/L), n = 1, 2, 


(10.46) 


For a random problem with no special symmetry the exponential choice typically leads to easier integrals. In this case 
the boundary condition has some symmetry that you can take advantage of: it’s almost odd. The constant term in 
Eq. (10.30) is the n = 0 element of the cosine set, and that's necessarily orthogonal to all the sines. For the rest, you 
do the expansion 


f +Vo/2 (0 < x < L) 

\ —Vo/2 (L < x < 2L) 


oo 

= a n sin(nnx / L) 

i 


The odd term in the boundary condition (10.45) is necessarily a sum of sines, with no cosines. The cosines are orthogonal 
to an odd function. See problem 10.11. 


More Electrostatic Examples 

Specify the electric potential in the x-y plane to be an array, periodic in both the x and the y-directions. V (, x , y, z = 0) 
is Vo on the rectangle (0 < x < a, 0 < y < b) as well as in the darkened boxes in the picture; it is zero in the white 
boxes. What is the potential for z > 0? 



The equation is still Eq. (10.36), but now you have to do the separation of variables along all three coordinates, 
V(x,y,z) = f{x)g{y)h{z). Substitute into the Laplace equation and divide by fgh. 


1 d 1 2 f 1 d 2 g 1 d 2 h 

f dx 2 g dy 2 + h dz 2 
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These terms are functions of the single variables x, y, and z respectively, so the only way this can work is if they are 
separately constant. 

1 d2 f _ p l d Ll = _ h 2 i^ = p,p = p 

fdx 2 1; gdy 2 * 2 ’ /i dz 2 1+fca 

I made the choice of the signs for these constants because the boundary function is periodic in x and in y , so I expect 
sines and cosines along those directions. The separated solution is 

(Asinfcix + B cos k\x)(C sin k 2 y + D cos k 2 y){Ee k3Z + Fe~ ksZ ) (10.47) 


What about the case for separation constants of zero? Yes, that’s needed too; the average value of the potential on the 
surface is Vo/2, so just as with the example leading to Eq. (10.43) this will have a constant term of that value. The 
periodicity in x is 2 a and in y it is 2b, so this determines 


k\ = mr/a , k 2 = rmt/b 


then 


h = 


n 2 n 2 


+ 


m 2 7t 2 


b 2 


n, m = 1,2,... 


where n and m are independent integers. Use the experience that led to Eq. (10.45) to write V on the surface as a sum 
of the constant Vo/2 and a function that is odd in both x and in y. As there, the odd function in x will be represented 
by a sum of sines in x, and the same statement will hold for the y coordinate. This leads to the form of the sum 


V(x,y,z) = \v z 


oo oo 

V V 


where k nm is the ks of the preceding equation. What happened to the other term in z, the one with the positive 
exponent? Did I say that I'm looking for solutions in the domain z > 0? 

At z = 0 this must match the boundary conditions stated, and as before, the orthogonality of the sines on the two 
domains allows you to determine the coefficients. You simply have to do two integrals instead of one. See problem 10.19. 


V {x, y,z > 0) 



odd n odd m 


(10.48) 
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10.7 Cylindrical Coordinates 

Rectangular coordinates are not always the right choice. Cylindrical, spherical, and other choices are often needed. For 
cylindrical coordinates, the gradient and divergence are, from Eqs. (9.24) and (9.15) 


„ dV 2 ldV J)V 
W = r ^r~ + 4>--rrr + z-^- 
dr r d<p dz 


and 


1 d(rv r 

v = o— 

r or 


V -v = 


i dvcj, 
r dcf) 


+ 


dv z 

dz 


Combine these to get the Laplacian in cylindrical coordinates. 


V- W = V 2 V 


1 d ( dV\ 1 d 2 V d 2 V 
r dr \ dr ) r 2 dcj) 2 + dz 2 


(10.49) 


For electrostatics, the equation remains V 2 V = 0, and you can approach it the same way as before, using separation 
of variables. I’ll start with the special case for which everything is independent of z. Assume a solution of the form 

V = f(r)g(<j>), then 


2 1 d ( df\ 1 d 2 g 

^ 9 r~&r V d^J 


= 0 


Multiply this by r 2 and divide by f(r)g((j)) to get 


( r <*l\ , 1 92 9 _ n 

f dr \ dr ) g dcj) 2 


This is separated. The first terms depends on r alone, and the second term on 0 alone. For this to hold the terms must 
be constants. 


L±( r ti\= a 

f dr V dr) 


and 


1 d 2 g = 
gd(j ) 2 


(10.50) 


The second equation, in 0, is familiar. If a is positive this is a harmonic oscillator, and that is the most common way 
this solution is applied. I’ll then look at the case for a > 0, for which the substitution a = n 2 makes sense. 
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a = n 2 > 0 


d?g 

dp 


2 = ~ n 9 


,d 2 f , if 2 


+ r jz~ n 'f = ° 


dr 2 dr 


g(p) = A cos np + B sin np 
f(r) = Cr n + Dr _n 


There's not yet a restriction that n is an integer, though that's often the case. Verifying the last solution for / is easy. 
A general solution that is the sum of all these terms is 


V{r,4>) = (C 0 + D 0 \nr)(A 0 + B 0 ij)) + ^2(C n r n + D n r n )(A n cos nip + B n sinnp) 


(10.51) 


Some of these terms should be familiar: 

0,4, is just a constant potential. 

04 lnr is the potential of a uniform line charge; dlnr/dr = l/r, and that is the way that the electric field drops off 
with distance from the axis. 

Ci Air cos <j) is the potential of a uniform field (as is the r sin if) term). Write this in the form CiAir coscj) = C\AiX, 
and the gradient of this is C\A\X. The sine gives y. 

See problem 10.24. 

Example 

A conducting wire, radius R, is placed in a uniform electric field Eq, and perpendicular to it. Put the wire along the 
z-axis and call the positive x-axis the direction that the field points. That’s 0 = 0. In the absence of the wire, the 
potential for the uniform field is V = —EqX = — Eor cos0, because —W = EqX. The total solution will be in the 
form of Eq. (10.51). 

Now turn the general form of the solution into a particular one for this problem. The entire range of 0 from 0 
to 27t appears here; you can go all the way around the origin and come back to where you started. The potential is 
a function, meaning that it's single valued, and that eliminates Bop. It also implies that all the n are integers. The 
applied field has a potential that is even in 0. That means that you should expect the solution to be even in 0 too. 
Does it really? You also have to note that the physical system and its attendant boundary conditions are even in 0 — 
it's a cylinder. Then too, the first few times that you do this sort of problem you should see what happens to the odd 
terms; what makes them vanish? I won't eliminate the sin0 terms from the start, but I'll calculate them and show that 
they do vanish. 

OO 

V (r, 0) = (C 0 + D 0 In r)B 0 + ^ ( C n r n + D n r~ n )(A n cos np + B n sin np) 

n=l 
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Carrying along all these products of (still unknown) factors such as D n A n is awkward. It makes it look neater and it is 
easier to follow if I combine and rename some of them. 

OO OO 

V(r, 0) = C 0 + D 0 In r + J ^{C n r n + D n r~ n ) cos n0 + J ~^[C' n r n + D' n r~ n ) siring (10.52) 

n = 1 n= 1 


As r — > oo, the potential looks like — Eorcoscj). That implies that C n = 0 for n > 1, and that C' n = 0 for all n, and 
that C\ = — Eq. 

Now use the boundary conditions on the cylinder. It is a conductor, so in this static case the potential is a constant 
all through it, in particular on the surface. I may as well take that constant to be zero, though it doesn’t really matter. 

OO OO 

V(R, (f)) = 0 = C o + D o lnR+ J ^(C n R n + D n R~ n ) cos n0 + J^( c ' n R n + D' n R~ n ) sin n0 

n = 1 n = 1 


Multiply by sin m(j) and integrate over 0. The trigonometric functions are orthogonal, so all that survives is 

0 = ( C' m R m + D' m R~ m ) 7T all m > 1 

That gets rid of all the rest of the sine terms as promised: D\ n = 0 for all m because C' m = 0 for all m. Now repeat 

for cos m(j>. 

0 = Co + Do In R (m = 0) and 0 = (C m R m + D m R~ m )7i (m > 0) 

All of the C m = 0 for m > 1, so this says that the same applies to D m . The m = 1 equation determines D\ in terms 
of C\ and then Eq. 

D l = -CiR 2 = +E 0 R 2 

Only the Co and Do terms are left, and that requires another boundary condition. When specifying the problem initially, 
I didn't say whether or not there is any charge on the wire. In such a case you could naturally assume that it is zero, but 
you have to say so explicitly because that affects the final result. Make it zero. That kills the lnr term. The reason for 
that goes back to the interpretation of this term. Its negative gradient is the electric field, and that would be — l/r, the 
field of a uniform line charge. If I assume there isn’t one, then Do = 0 and so the same for Co- Put this all together and 

r2 

V (r, 0) = —E 0 r cos 0 + E 0 — cos 0 


(10.53) 
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The electric field that this presents is, from Eq. (9.24) 

E = —W = Eq (f cos 0 — 0sin0) — EqR 2 f-^ cos 0 — 0^ sin0^ 

i ? 2 

= Eq x + Eq (f cos 0 + 0 sin 0) 

As a check to see what this looks like, what is the electric field at the surface of the cylinder? 

E(R, 0) = Eq(? cos 0 — 0sin0) — EqR 2 cos 0 — 0-^ sin 0^j = 2£'of cos0 

It's perpendicular to the surface, as it should be. At the left and right, 0 = 0, 7T, it is twice as large as the uniform field 
alone would be at those points. 
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Problems 

10.1 The specific heat of a particular type of stainless steel (CF8M) is 500 J/kg-K. Its thermal conductivity is 16.2 W/m-K 
and its density is 7750 kg/m 3 . A slab of this steel 1.00 cm thick is at a temperature 100°C and it is placed into ice water. 
Assume the simplest boundary condition that its surface temperature stays at zero, and find the internal temperature at 
later times. When is the 2 nd term in the series, Eq. (10.15), only 5% of the 1 st ? Sketch the temperature distribution 
then, indicating the scale correctly. 

10.2 In Eq. (10.13) I eliminated the n = 0 solution by a fallacious argument. What is a in this case? This gives one 
more term in the sum, Eq. (10.14). Show that with the boundary conditions stated, this extra term is zero anyway (this 
time). 

10.3 In Eq. (10.14) you have the sum of many terms. Does it still satisfy the original differential equation, Eq. (10.3)? 

10.4 In the example Eq. (10.15) the final temperature was zero. What if the final temperature is T\1 Or what if I use 
the Kelvin scale, so that the final temperature is 273°? Add the appropriate extra term, making sure that you still have 
a solution to the original differential equation and that the boundary conditions are satisfied. 

10.5 In the example Eq. (10.15) the final temperature was zero on both sides. What if it’s zero on just the side at 
x = 0 while the side at x = L stays at To? What is the solution now? 

Ans: T 0 x/L + (2T 0 /tt) E^ 1 /™) sin(mrx / L)e~ n ^ 2Dt l L2 

10.6 You have a slab of material of thickness L and at a uniform temperature To. The side at x = L is insulated so 
that heat can't flow in or out of that surface. By Eq. (10.1), this tells you that dT/dx = 0 at that surface. Plunge 
the other side into ice water at temperature T = 0 and find the temperature inside at later time. The boundary 
condition on the x = 0 surface is the same as in the example in the text, T(0, t) = 0. (a) Separate variables and 
find the appropriate separated solutions for these boundary conditions. Are the separated solutions orthogonal? Use the 
techniques of Eq. (5.15). (b) When the lowest order term has dropped to where its contribution to the temperature at 
x = L is To/2, how big is the next term in the series? Sketch the temperature distribution in the slab at that time. 
Ans: (4T 0 /7r)E“(2^)sm [{n + \)nx/L] e -(n+i/ 2 )^Dt/L^ _ 9-43 x 10 -5 Tq 

10.7 In the analysis leading to Eq. (10.26) the temperature at y = b was set to To. If instead, you have the temperature 
at x = a set to To with all the other sides at zero, write down the answer for the temperature within the rod. Now use 
the fact that Eq. (10.21) is linear to write down the solution if both the sides at y = b and x = a are set to Tq. 
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10.8 In leading up to Eq. (10.25) I didn’t examine the third possibility for the separation constant, that it's zero. Do so. 

10.9 Look at the boundary condition of Eq. (10.22) again. Another way to solve this problem is to use the solution for 
which the separation constant is zero, and to use it to satisfy the conditions at y = 0 and y = b. You will then have one 
term in the separated solution that is Toy /b, and that means that in Eq. (10.23) you will have to choose the separation 
variable to be positive instead of negative. Why? Because now all the rest of the terms in the sum over separated 
solutions must vanish at y = 0 and y = b. You’ve already satisfied the boundary conditions on those surfaces by using 
the Toy/b term. Now you have to satisfy the boundary conditions on x = 0 and x = a because the total temperature 
there must be zero. That in turn means that the sum over all the rest of the separated terms must add to —Toy/b 
at x = 0 and x = a. When you analyze this solution in the same spirit as the analysis of Eq. (10.26), compare the 
convergence properties of that solution to your new one. In particular, look at a < i and a 2> b to see which version 
converges better in each case. 

Ans: Toy/b + (2T 0 /tt) [(— l) n / n ] sin(ra ry/b) cosh (nn(x — a/2 )/b) / cosh(n7ra/2&) 

10.10 Finish the re-analysis of the electrostatic boundary value problem Eq. (10.45) starting from Eq. (10.46). This will 
get the potential for z yf 0 with perhaps less work than before. 

10.11 Examine the solution Eq. (10.42) at z = 0 in the light of problem 5.11. 

10.12 A thick slab of material is alternately heated and cooled at its surface so the its surface temperature oscillates as 

T < 0 ’ “ { \ £<**<* t) T <°- * + ^ 0 ) = T(0,t) 

That is, the period of the oscillation is 2to ■ Find the temperature inside the material, for x > 0. How does this behavior 
differ from the solution in Eq. (10.20)? 

Ans: E“=o (!/(2fc + l))e~^ kX sin ((2k + 1 )ut - P k x))\ u = vr /t 0 

10.13 Fill in the missing steps in finding the solution, Eq. (10.34). 

10.14 A variation on the problem of the alternating potential strips in section 10.6. Place a grounded conducting sheet 
parallel to the x-y plane at a height z = d above it. The potential there is then V(x,y,z = d) = 0. Solve for the 
potential in the gap between z = 0 and z = d. A suggestion: you may find it easier to turn the coordinate system over 
so that the grounded sheet is at z = 0 and the alternating strips are at z = d. This switch of coordinates is in no way 
essential, but it is a bit easier. Also, I want to point out that you will need to consider the case for which the separation 
constant in Eq. (10.37) is zero. 
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10.15 Starting from Eq. (10.52) and repeat the example there, but assume that the conducting wire is in an external 
electric field E^y instead of EqX. Repeat the calculation for the potential and for the electric field, filling in the details 
of the missing steps. 

10.16 A very long conducting cylindrical shell of radius R is split in two along lines parallel to its axis. The two halves 
are wired to a circuit that places one half at potential Vo and the other half at potential —Vo. (a) What is the potential 
everywhere inside the cylinder? Use the results of section 10.7 and assume a solution of the form 



Match the boundary condition that 


V{RA) 


Vo (0 < 0 < 7t) 

—Vq ( 7 r < cf> < 2ir) 


I picked the axis for 0 = 0 pointing toward the split between the cylinders. No particular reason, but you have to make 
a choice. I make the approximation that the cylinder is infinitely long so that z dependence doesn't enter. Also, the two 
halves of the cylinder almost touch so I'm neglecting the distance between them. 

(b) What is the electric field, — W on the central axis? Is this answer more or less what you would estimate before 
solving the problem? Ans: (b) E = 4Vq/7 tR. 


10.17 Solve the preceding problem outside the cylinder. The integer n can be either positive or negative, and this time 
you'll need the negative values. (And why must n be an integer?) 

Ans: (4Vb/ n) odd ( 1 /n) (. R/r) n sin 770 

10.18 In the split cylinder of problem 10.16, insert a coaxial wire of radius R\ < R. It is at zero potential. Now 
what is the potential in the domain R\ < r < R? You will need both the positive and negative n values, ^(A n r n + 
B n r~ n ) sin 

Ans: (4Vb/7r) £modd sinr ncj)[-R^ m r m + r- m } / m[R~ m Rf - R^ m R m ] 


10.19 Fill in the missing steps in deriving Eq. (10.48). 
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10.20 Analyze how rapidly the solution Eq. (10.48) approaches a constant as z increases from zero. Compare Eq. (10.44). 


10.21 A broad class of second order linear homogeneous differential equations can, with some manipulation, be put into 
the form (Sturm-Liouville) 

(p(x)u' Y + q(x)u = Xw(x)u 

Assume that the functions p, q, and w are real, and use manipulations much like those that led to the identity Eq. (5.15). 
Derive the analogous identity for this new differential equation. When you use separation of variables on equations 
involving the Laplacian you will commonly come to an ordinary differential equation of exactly this form. The precise 
details will depend on the coordinate system you are using as well as other aspects of the PDE. 

10.22 Carry on from Eq. (10.31) and deduce the separated solution that satisfies these boundary condition. Show that 
it is equivalent to Eq. (10.32). 

10.23 The Laplacian in cylindrical coordinates is Eq. (10.49). Separate variables for the equation X/ 2 V = 0 and you will 
see that the equations in 2 and 0 are familiar. The equation in the r variable is less so, but you've seen it (almost) in 
Eqs. (4.18) and (4.22). Make a change of variables in the r-differential equation, r = hr 1 , and turn it into exactly the 
form described there. 

10.24 In the preceding problem suppose that there's no ^-dependence. Look at the case where the separation constant is 
zero for both the r and 0 functions, finally assembling the product of the two for another solution of the whole equation. 
These results provide four different solutions, a constant, a function of r alone, a function of 0 alone, and a function of 
both. In each of these cases, assume that these functions are potentials V and that E = — VV is the electric field from 
each potential. Sketch equipotentials for each case, then sketch the corresponding vector fields that they produce (a lot 
of arrows). 

10.25 Do problem 8.23 and now solve it, finding all solutions to the wave equation. 

Ans: f(x — vt) + g(x + vt) 

10.26 Use the results of problem 10.24 to find the potential in the corner between two very large metal plates 
set at right angles. One at potential zero, the other at potential Vq. Compute the electric field, —W and 
draw the results. Ans: — 2Vo0/7 rr 

10.27 A thin metal sheet has a straight edge for one of its boundaries. Another thin metal sheet is cut the same way. 
The two straight boundaries are placed in the same plane and almost, but not quite touching. Now apply a potential 
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difference between them, putting one at a voltage Vq and the other at —Vo- In the region of space near to the almost 
touching boundary, what is the electric potential? From that, compute and draw the electric field. 

10.28 A slab of heat conducting material lies between coordinates x = —L and x = +L, which are at temperatures 
Tj and X2 respectively. In the steady state (dT / dt = 0), what is the temperature distribution inside? Now express the 
result in cylindrical coordinates around the z-axis and show how it matches the sum of cylindrical coordinate solutions 
of V 2 T = 0 from problem 10.15. What if the surfaces of the slab had been specified at y = —L and y = +T instead? 

10.29 The result of problem 10.16 has a series of terms that look like (x n /n) sinncj) (odd ri). You can use complex 
exponentials, do a little rearranging and factoring, and sum this series. Along the way you will have to figure out what 
the sum z + z 3 / 3 + z 5 / 5 + • • • is. Refer to section 2.7. Finally of course, the answer is real, and if you look hard 
you may find a simple interpretation for the result. Be sure you’ve done problem 10.24 before trying this last step. 
Ans: 2Vq{9\ + #2 )/tt- You still have to decipher what 9\ and 62 are. 

10.30 S um the series Eq. (10.27) to get a closed-form analytic expression for the temperature distribution. You 
will find the techniques of section 5.7 useful, but there are still a lot of steps. Recall also In (re*) = lnr + id. 
Ans: (2T 0 /tt) tan -1 [sin(7rx/a)/ sinh(7r(fe — y)/a)] 

10.31 A generalization of the problem specified in Eq. (10.22). Now the four sides have temperatures given respectively 
to be the constants T\ , To, X3, T 4 . Note: with a little bit of foresight, you won't have to work very hard at all to solve 
this. 

10.32 Use the electrostatic equations from problem 9.21 and assume that the electric charge density is given by p = 
pod/r , where this is in cylindrical coordinates, (a) What cylindrically symmetric electric field comes from this charge 
distribution? (b) From E = — VV^ what potential function V do you get? 

10.33 Repeat the preceding problem, but now interpret r as referring to spherical coordinates. What is V 2 V1 

10.34 The Laplacian in spherical coordinates is Eq. (9.43). The electrostatic potential equation is V 2 V = 0 just as 
before, but now take the special case of azimuthal symmetry so that the potential function is independent of <f). Apply the 
method of separation of variables to find solutions of the form f(r)g(9). You will get two ordinary differential equations 
for / and g. The second of these equations is much simpler if you make the change of independent variable x = cos 9. 
Use the chain rule a couple of times to do so, showing that the two differential equations are 

(1 — x 2 )^j-^ — 2,x < ^~ + Cg = 0 and r 2 ^j-^ + 2r^f- — Cf = 0 
dx 2 dx dr 2 dr 
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10.35 For the preceding equations show that there are solutions of the form /(r) = Ar n , and recall the analysis in 
section 4.11 for the g solutions. What values of the separation constant C will allow solutions that are finite as x -4- ±1 
(6 — y 0, 7r ) ? What are the corresponding functions of r? Don't forget that there are two solutions to the second order 
differential equation for / — two roots to a quadratic equation. 

10.36 Write out the separated solutions to the preceding problem (the ones that are are finite as 6 approaches 0 or 7t) 
for the two smallest allowed values of the separation constant C\ 0 and 2. In each of the four cases, interpret and sketch 
the potential and its corresponding electric field, — VV. How do you sketch a potential? Draw equipotentials. 


10.37 From the preceding problem you can have a potential, a solution of Laplace's equation, in the form (Ar + 
B /r 2 ) cos 6 . Show that by an appropriate choice of A and B, this has an electric field that for large distances from 
the origin looks like EqZ, and that on the sphere r = R the total potential is zero — a grounded, conducting sphere. 
What does the total electric field look like for r > R\ sketch some field lines. Start by asking what the electric field is 
as r — > R. 

10.38 S imilar to problem 10.16, but the potential on the cylinder is 


V(RA) 


Vo (0 < 0 < 7t/2 and 7t < 0 < 37t/2) 
—Vo (vr/2 < 0 < 7t and 37t/2 < 0 < 27t) 


Draw the electric field in the region near r = 0. 

10.39 What is the area charge density on the surface of the cylinder where the potential is given by Eq. (10.53)? 


Numerical Analysis 

You could say that some of the equations that you encounter in describing physical systems can't be solved in terms 
of familiar functions and that they require numerical calculations to solve. It would be misleading to say this however, 
because the reality is quite the opposite. Most of the equations that describe the real world are sufficiently complex that 
your only hope of solving them is to use numerical methods. The simple equations that you find in introductory texts 
are there because they can be solved in terms of elementary calculations. When you start to add reality, you quickly 
reach a point at which no amount of clever analytical ability will get you a solution. That becomes the subject of this 
chapter. In all of the examples that I present I'm just showing you a taste of the subject, but I hope that you will see 
the essential ideas of how to extend the concepts. 

11.1 Interpolation 

Given equally spaced tabulated data, the problem is to find a value between the tabulated points, and to estimate the 
error in doing so. As a first example, to find a value midway between given points use a linear interpolation: 

fix o + h/2) » ^ [f(x o) + f{x o + h)] 

This gives no hint of the error. To compute an error estimate, it is convenient to transform the variables so that this 
equation reads 

/(o)*T[/m +/ ho], 

where the interval between data points is now 2k. Use a power series expansion of / to find an estimate of the error. 

/(*) = /( 0) + A-/'(0) + |fc 2 /"(0) + • • - 

/(-*)=/(' 0) -*/'(») + ifc 2 /"(0) + --- 

Then 

\[f(k) + f(~k)] « /(o) + [jyn o)], (in) 

where the last term is your error estimate: 

Error = Estimate - Exact = +k 2 f"{ 0)/2 = +h 2 f"{ 0)/8 
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And the relative error is (Estimate — Exact)/Exact. 

As an example, interpolate the function f(x) = 2 X between 0 and 1. Here h = 1. 

2 1 / 2 » ^[2° + 2 1 ] = 1.5 

The error term is 

error « (In 2) 2 2 x /% for x = .5 
= (.693) 2 (1.5)/8 = .090, 

and of course the true error is 1.5 — 1.414 = .086 

You can write a more general interpolation method for an arbitrary point between xq and Xq + h. The solution is 
a simple extension of the above result. 

The line passing through the two points of the graph is 


V-fo = (x-x 0 )(fi -fo)/h, 



At the point x = Xq + ph you have 


fo = f(xo), fi = f(x 0 + h). 


V = fo + (ph)(fi ~ fo)/h = fo(l ~p) + fiP 

As before, this approach doesn't suggest the error, but again, the Taylor series allows you to work it out to be [h 2 p{ 1 — 
P)f"(x 0 +ph)/2]. 

The use of only two points to do an interpolation ignores the data available in the rest of the table. By using 
more points, you can greatly improve the accuracy. The simplest example of this method is the 4-point interpolation to 
find the function halfway between the data points. Again, the independent variable has an increment h = 2k, so the 
problem can be stated as one of finding the value of /( 0) given f(±k ) and f(±3k). 
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I want to isolate /( 0) from this, so take 


f(k) + f{-k ) = 2/(0) + k 2 f ff ( 0) + —k 4 f""(0) + • • 

Ol 

fm + f {—3k) = 2/(0) + 9k 2 f"(0) + |^ 4 /""( 0) + 


The biggest term after the /( 0) is in k 2 f"(0), so I'll eliminate this. 

[/(3A:) + /(-3fc)] - 9[f(k) - f(-k )] « -16/(0) + 


81 _ 9 
12 ~~ 12 


k 4 f""(0) 


( 11 . 2 ) 


/(0) « ^ [ - / ( 3/c) + 9 f(-k) + 9 f(k) - f (0k)] - [ - \k 4 f""{ 0)] . (11.3) 

The error estimate is then — 3/?. 4 / //// (0) / 128. 

To apply this, take the same example as before, f(x) = 2 X at x = .5 

2 1 / 2 » — \-2~ l + 9 ■ 2° + 9 ■ 2 1 - 2 2 1 = — = 1.40625, 

16 L J 32 

and the error is 1.40625 — 1.41421 = —.008, a tenfold improvement over the previous interpolation despite the fact that 
the function changes markedly in this interval and you shouldn’t expect interpolation to work very well here. 

11.2 Solving equations 

Example: sina: — x/2 = 0 

From the first graph, the equation clearly has three real solutions, but finding them is the problem. The first 
method for solving f(x) = 0 is Newton's method. The basic idea is that over a small enough region, everything is more 
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or less linear. This isn’t true of course, so don't be surprised that this method doesn’t always work. But then, nothing 
always works. 



A general picture of a function with a root is the second graph. In that case, observe that if Xq is a first 
approximation to the root of /, the straight line tangent to the curve can be used to calculate an improved approximation. 
The equation of this line is 

V-f(x o) = f\x o)(x-x 0 ) 

The root of this line is defined by y = 0, with solution 

x = x 0 - f(x 0 )/f(x 0 ) 

Call this solution x\. You can use this in an iterative procedure to find 

X 2 = xi-f(x 1 )/f'(xi), (11.4) 



and in turn X 3 is defined in terms of X 2 etc. 

Example: Solve sina; — x/2 = 0. From the graph, a plausible guess for a root is Xq = 2. 

X\ = Xo — (sin xo —Xq/2)/ (cos xq — \/2) 

= 1.900995594 f(xi) = -.00452 

X 2 = X\ — (sin X\ — X\/2)/ (cos X\ — 1/2) 


= 1.895511645 f(x 2 ) = -.000014 

X 3 = x 2 - (sin £ 2 - £ 2 / 2 ) /(cos £2 - 1/2) 

= 1.895494267 f(x 3 ) = -1.4 x 10“ 10 


Such iterative procedures are ideal for use on a computer, but use them with caution, as a simple example shows: 

f(x) = a; 1//3 . 
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Instead of the root x = 0, the iterations in this first graph carry the supposed solution infinitely far away. This happens 
here because the higher derivatives neglected in the straight line approximation are large near the root. 



A milder form of non-convergence can occur if at the root the curvature changes sign and is large, as in the second 
graph. This can lead to a limit cycle where the iteration simply oscillates from one side of the root to the other without 
going anywhere. As I said earlier this doesn't always work. 

A non-graphical derivation of this method starts from a Taylor series: If Zq is an approximate root and Zq + e is a 
presumed exact root, then 

f{z 0 + e) = 0 = /Oo) + ef(z 0 ) + ■■■ 

Neglecting higher terms then, 

e = -/Oo)//'Oo), and zi = z 0 + e = z 0 - f(z 0 )/ f(z 0 ), (11.5) 

as before. Here z appears instead of x to remind you that this method is just as valid for complex functions as for real 
ones (and has as many pitfalls). 

There is a simple variation on this method that can be used to speed convergence where it is poor or to bring 
about convergence where the technique would otherwise break down. 

Xi=x 0 -wf(xo)/f(x 0 ) (11.6) 

W is a factor that can be chosen greater than one to increase the correction or less than one to decrease it. Which 
one to do is more an art than a science (1.5 and 0.5 are common choices). You can easily verify that any choice of w 
between 0 and 2/3 will cause convergence for the solution of a; 1 / 3 = 0. You can also try this method on the solution 
of f(x) = x 2 = 0. A straight-forward iteration will certainly converge, but with painful slowness. The choice of w > 1 
improves this considerably. 

When Newton's method works well, it will typically double the number of significant figures at each iteration. 
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A drawback to this method is that it requires knowledge of f'(x), and that may not be simple. An alternate 
approach that avoids this starts from the picture in which a secant through the curve is used in place of a tangent at a 
point. 

Given f(x 1) and f(x 2), construct a straight line 


V ~ f(x 2 ) 


'f{X2)-f(Xi)~ 
x 2 - Xi 


(x - x 2 ) 



This has its root at y = 0, or 

I = X2 “ /(I2, 7(TF7(T) (1L7) 

This root is taken as x% and the method is iterated, substituting x 2 and X3 for x\ and x 2 . As with Newton's method, 
when it works, it works very well, but you must look out for the same type of non-convergence problems. This is called 
the secant method. 

11.3 Differentiation 

Given tabular or experimental data, how can you compute its derivative? 

Approximating the tangent by a secant, a good estimate for the derivative of / at the midpoint of the (xi,x 2 ) 
interval is 

[f(x 2 ) - f(x l )]/{x 2 -x 1 ) 


As usual, the geometric approach doesn’t indicate the size of the error, so it’s back to Taylor's series. 
Given data at points x = 0, ±/i, ±2 h I want the derivative /'( 0). 



m = /( o) + hf( 0) + |ft 2 /"( 0) + o) + . . ■ 
n-h) = no) - hf( 0) + |h 2 /"( 0) - |fi. 3 /"'(0) + • • • 
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In order to isolate the term in /'( 0), it's necessary to eliminate the larger term /( 0), so subtract: 

/(ft ) - /(-ft) = 2hf(0) + 0) + • • • , 

1 . | ( 11 - 8 ) 
giving W) [/(ft) - /(-ft)] « /'( 0) + [-ft 2 /"'( 0)_ 

and the last term, in brackets, estimates the error in the straight line approximation. 

The most obvious point about this error term is that it varies as h 2 , and so indicates by how much the error 
should decrease as you decrease the size of the interval. (How to estimate the factor f"'{ 0) I’ll come to presently.) This 
method evaluates the derivative at one of the data points; you can make it more accurate if you evaluate it between the 
points, so that the distance from where the derivative is being taken to where the data is available is smaller. As before, 
let h = 2k, then 

4 [/(ft) - /(-ft)] = /'(») + ^ft 2 /'"(0) + • • • , 

or, in terms of h with a shifted origin, 

l[/(ft)-/(0)] =»/'(ft/2) + (11.9) 

and the error is only 1/4 as big. 

As with interpolation methods, you can gain accuracy by going to higher order in the Taylor series, 

/(ft) - /(-ft) = 2ft/'(0) + |ft 3 /'"(0) + ift 5 /"( 0) + • • • 

O 90 

f(2h) - f{-2h) = 4A/'(0) + |A 3 f"( 0) + (0) + • • • 

To eliminate the largest source of error, the h 3 term, multiply the first equation by 8 and subtract the second. 

24 .. 

8 [/(ft) - /(-ft)] - [/(2ft) - /(-2ft)] = 12ft/' (0) - -ft 5 /(0) + • • • , 

or 

/'(0) « jT [/(-2ft) - 8 /(-ft) + 8/ (ft) - /(2ft)] - [ - 4ftV”(0)] (11.10) 

with an error term of order h 4 . 



11 — Numerical Analysis 


360 


As an example of this method, let f(x) = sin a; and evaluate the derivative at x = 0.2 by the 2-point formula and 
the 4-point formula with h=0.1: 

2-point: [0.2955202 - 0.0998334] = 0.9784340 

4-point: 1 [0.0 - 8 x 0.0998334 + 8 x 0.2955202 - 0.3894183] 

= 0.9800633 
cos 0.2 = 0.9800666 

Again, you have a more accurate formula by evaluating the derivative between the data points: h = 2k 

m - f(-k ) = 2kf(0) + \k 3 f"'( 0) + 4p/«(0) 

97 949 

/(3ft) - f(-3k) = 6kf(0) + jfc 3 /"'( 0) + j^k s f v (0) 

91 

2 7[m - f(-k )] - [/(3ft) - f(-3k)] = 48ft/'(0) - ~^k»r( 0) 

Changing k to h / 2 and translating the origin gives 

^ in-h) - 27/(0) + 27 /(ft) - /(2ft)] = /'(ft/2) - Aft 4 /»(ft/ 2 ), (11.11) 

and the coefficient of the error term is much smaller. 

The previous example of the derivative of sina: at x = 0.2 with h = 0.1 gives, using this formula: 

[0.0499792 - 27 x 0.1494381 + 27 x 0.2474040 - 0.3428978] = 0.9800661, 

and the error is less by a factor of about 7. 

You can find higher derivatives the same way. 

/(ft) = /( 0) + /»/'( 0) + |ft 2 /"(o) + |ft 3 /"'(0) + 4 ft 4 /""( 0 ) 

/(ft) + /(-ft) = 2/(0) + ft 2 /"(o) + 4ft 4 /""( o) + • • • 

/"(o) = AzA ~ V(°) + W) _ Tft2/»"( o) 


( 11 . 12 ) 
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Notice that the numerical approximation for /"( 0) is even in h because the second derivative is unchanged if x is changed 
to —x. 

You can get any of these expressions for higher derivatives recursively, though finding the error estimates requires 
the series method. The above expression for f"{ 0) can be viewed as a combination of first derivative formulas: 


/"(0)« [f'(h/2)-f(-h/2)]/h 


1 

h 


m - m no) - n-h) 


h 


= [f(h) - 2/(0) + /(- 


h 

- h)]/h 2 


(11.13) 


Similarly, the third and higher derivatives can be computed. The numbers that appear in these numerical derivatives are 
simply the binomial coefficients, Eq. (2.18). 

11.4 Integration 

The basic definition of an integral is the limit of a sum as the intervals approach zero. 



^2fiCi)(xi+i-Xi) {Xi<Ci<x i+ 1), (11-14) 


This is more fully explained in section 1.6, and it is the basis for the various methods of numerical evaluations of integrals. 

The simplest choices to evaluate the integral of f(x) over the domain to Xo + h would be to take the position 
of £ at one of the endpoints or maybe in the middle (here h is small). 



f(x) dx r 

« f(xo)h 

( a ) 


or 

f(x o + h)h 

(b) 

(11.15) 

or 

f(xo + h/2)h (midpoint rule) 

(c) 


or 

[f{xo) + f{xo + h)]h/2 (trapezoidal rule) 

(d) 
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The last expression is the average of the first two. 

I can now compare the errors in all of these approximations. Set x$ = 0. 

j dxf(x) = J dx [/ (0) + xf'(0) + ^x 2 /"(0) + ll 3 /"'(0) + •• • ] 

= hf( 0) + iftV'(O) + T 3 /"( 0) + i/. 4 /"'( 0) + • • • 

This immediately gives the error in formula (a): 

error (a) = hf{ 0) - ^ dxf(x ) « -h 2 f( 0). 

J o z 

The error for expression (b) requires another expansion, 


r h 

error (b) = hf(h) — / dxf(x) 

Jo 


= h [/(0) + hf( 0) + •■■]- [hf (0) + -A 2 /'( 0) + • ■ ■ ] 


Since this is the opposite sign from the previous error, it is immediately clear that the error in (d) will be less, 
(d) is the average of (a) and (b). 


error (d) = [/( 0) + /( 0) + hf\ 0) + \h 2 f\ 0) + ••■]£ 


1 1 


4 6 


7 - 7 ^/"( 0 ) = 77 ^/ ( 0 ) 


-[A/(0) + ^7'(0) + lfc 3 /"(0) + .--] 

1 


12 


Similarly, the error in (c) is 


error 


(c) = ft[/(0) + lfc/'(0) + lft 2 /"(0) + ...] 


- [/i/(0)+ l/i 2 /'(0) + lft 2 /"(0) + •••] 


~^h 3 f"(0) 


(11.16) 


(11.17) 

because 


(11.18) 


(11.19) 
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The errors in the (c) and (d) formulas are both therefore the order of A 3 . 

Notice that just as the errors in formulas (a) and (b) canceled to highest order when you averaged them, the same 
thing can happen between formulas (c) and (d). Here however you need a weighted average, with twice as much of (c) 
as of (d). [1/12 — 2/24 = 0] 


* (d) + |(c) = [f{x 0 ) + f{x o + h)] ^ + f(x o + h/2)^h (11.20) 

This is known as Simpson's rule. 

Simpson’s Rule 

Before applying this last result, I'll go back and derive it in a more systematic way, putting it into the form you'll see 
most often. 

Integrate Taylor's expansion over a symmetric domain to simplify the algebra: 

f h dx f{x) = 2hf (0) + 2 -h^f" (0) + 0) + • • • (11.21) 

I'll try to approximate this by a three point formula a 'f(-h) + /3 f (0) + ^ f (h) where a, f3, and 7, are unknown. Because 
of the symmetry of the problem, you can anticipate that a = 7, but let that go for now and it will come out of the 
algebra. 


<xf{-h) +/3/(0) +lf{h) = 

a[/(0) - hf( 0) + l/i 2 /"( 0) - l/i 3 /'"( 0) + ifc 4 /""( 0) + • • •] 

0) 

+t[/(0) + hf'( 0 ) + lfc 2 /"( 0) + lfc 3 /"'(0) + ift 4 /'"'(0) + • - ■ ] 

You now determine the three constants by requiring that the two series for the same integral agree to as high an 
order as is possible for any f. 


2h = a + f3 + 7 
0 = —ah + 7 h 

\h 3 = ^(a + 7 )h 2 


ct = 7 = /i/3, /3 = 4/7 / 3 
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and so, J dx f{x) » ^ [f(-h) + 4/(0) + f(h)\ . (11.22) 

The error term (the "truncation error”) is 

| lf(-h) + 4/(0) + /(-ft)] - j h dxf(x) =» T ■ ift 5 /""(0) - 4/» 5 /""(0) = 4ft 5 /""(0) (11,23) 

Simpson’s rule is exact up through cubics, because the fourth and higher derivatives vanish in that case. It’s worth 
noting that there is also an elementary derivation of Simpson’s rule: Given three points, there is a unique quadratic in 
x that passes through all of them. Take the three points to be ( — h, f(—h)) , (0, /( 0)) , and (//, f{h )) , then integrate 
the resulting polynomial. Express your answer for the integral in terms of the values of / at the three points, and you 
get the above Simpson’s rule. This has the drawback that it gives no estimate of the error. 

To apply Simpson’s rule, it is necessary to divide the region of integration into an even number of pieces and apply 
the above formula to each pair. 


J dx f(x) » | [f{x q) + 4/ (xi) + f(x 2 )\ + ^ [f(x 2 ) + 4 f(x 3 ) + f(x 4 )] + ■ ■ 


+ | [f(x N - 2 ) + 4 /(xjv-i) + fix at)] 


h, 


= 3 [/(®o) + 4 /(xi) + 2 / (x 2 ) + 4 /(x 3 ) + • • • + 4 /(x A r_ 1 ) + /(xjv)] 


(11.24) 


Example: 


-i 


,/o 1 + a: 2 

Divide the interval 0 to 1 into four pieces, then 


r dx = 4 tan 1 x 


= 71 


-1 


I _L^~± 
/ 0 1 + x 2 12 


1.1,1 1 

1 + 4 , . 7—1-777, + 2 — ——7— 7 + 4 —— — — 7— 77 + 


T + (1/4) 2 1 + (1/2) 2 1 + (3/4)2 1 + 1 J 


= 3.1415686 


as compared to 7t = 3.1415927. . .. 

When the function to be integrated is smooth, this gives very accurate results. 
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Integration 

If the integrand is known at all points of the interval and not just at discrete locations as for tabulated or experimental 
data, there is more freedom that you can use to gain higher accuracy even if you use only a two point formula: 


l-h 


f (x) dx « a[f((3 ) + f(-fi) 


I could try picking two arbitrary points, not symmetrically placed in the interval, but the previous experience with 
Simpson's rule indicates that the result will come out as indicated. (Though it’s easy to check what happens if you pick 
two general points in the interval.) 

2ft/(0) + T 3 /"( 0) + T/> 5 /"" ( ' 0) + • • • = o[2/(0) + /3 2 /"(0) + i/J 4 /""(0) + . . ■ ] 

To make this an equality through the low orders implies 


with an error term 


and 


2 h = 2 a 
a = h 


-h 3 = ap 2 

3 

P = h/V 3 

1 


— ■ -h 5 f""(0) - —h 5 f""( 0) = — -h 5 f""(0), 
12 9 ^ v ; 60 ^ w 135 J K 


l-h 


f (x) dx k h If (h/V 3) +f(-h 




With just two points, this expression yields an accuracy equal to the three point Simpson formula. 
Notice that the two points found in this way are roots of a certain quadratic 


x — 


1 

71 




(11.25) 


(11.26) 


which is proportional to 


3 2 1 TD / \ 

2 X ~ g = P 2(a:), 


(11.27) 
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the Legendre polynomial of second order. Recall section 4.11. 

This approach to integration, called Gaussian integration, or Gaussian quadrature, can be extended to more points, 
as for example 

eh 


—h 


f(x) dx 


+ 7 /( 0 ) + «/(/?) 


The same expansion procedure leads to the result 


h 

9 


5/ (-ft/ f) 


+ 8 /( 0 ) + / 



(11.28) 


with an error proportional to h 7 0). The polynomial with roots 0, ±y/ 3/5 is 

^x 3 ~ \ X = P 3( X ), (11.29) 

the third order Legendre polynomial. 

For an integral fj’ f(x) dx, let x = [(a + b)/ 2] + z[(b — a)/2\. Then for the domain — 1 < z < 1, x covers the 
whole integration interval. 

J f{x)dx ^ h ~Y L J dzf{x) 

When you use an integration scheme such as Gauss’s, it is in the form of a weighted sum over points. The weights and 
the points are defined by equations such as (11.26) or (11.28). 


'-1 


dzf(z) -4 £>*/(**) 


or 


/ (x) dx = h —^^ j w k f(x k ), x k = [{a + b)/2} + z k [(b-a)/2} 


(11.30) 


Many other properties of Gaussian integration are discussed in the two books by C. Lanczos, “Linear Differential 
Operators,” “Applied Analysis,” both available in Dover reprints. The general expressions for the integration points as 
roots of Legendre polynomials and expressions for the coefficients are there. The important technical distinction he points 
out between the Gaussian method and generalizations of Simpson's rule involving more points is in the divergences for 
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large numbers of points. Gauss’s method does not suffer from this defect. In practice, there is rarely any problem with 
using the ordinary Simpson rule as indicated above, though it will require more points than the more elegant Gauss's 
method. When problems do arise with either of these methods, they often occur because the function is ill-behaved, 
and the high derivatives are very large. In this case it can be more accurate to use a method with a lower order 
derivative for the truncation error. In an extreme case such a integrating a non-differentiable function, the apparently 
worst method, Eq. (11.15)(a), can be the best. 

11.5 Differential Equations 

To solve the first order differential equation 


y' = f{x,y) y(x 0 )=y 0 , (11.31) 

the simplest algorithm is Euler's method. The initial conditions are y(x o) = yo, and y'(x o) = f(xo,yo), and a straight 
line extrapolation is 

y(xo + h)=y 0 + hf(x 0 ,yo). (11.32) 

You can now iterate this procedure using this newly found value of y as a new starting condition to go from Xq + h to 

Xq + 2 h. 

Runge-Kutta 

Euler’s method is not very accurate. For an improvement, change from a straight line extrapolation to a parabolic one. 
Take Xq = 0 to keep the algebra down, and try a solution near 0 in the form y(x) = a + fix + r jx 2 ' 1 evaluate a, fi, and 
7 so that the differential equation is satisfied near x = 0 , 

y' = fi + 2^x = f(x, a + fix + 7 X 2 ). 

Recall the Taylor series expansion for a function of two variables, section 2.5: 

f{x,y) = f(.x 0 ,y 0 ) + (x - x 0 )D 1 f(x 0 ,y 0 )+(y - y 0 )D 2 f(x 0 ,yo) + ^{x - x 0 ) 2 DiD 1 f(x 0 ,yo) 

+ ^ (y - yo) 2 D 2 D 2 f(x 0 , y 0 ) + (x - x 0 )(y - yo)D 1 D 2 f(x 0 , 2 / 0 ) H (11.33) 

fi + 2^x = / (0, a) + xD\f ( 0 , a) + (fix + r jx 2 )D 2 f ( 0 , a) H . (11.34) 

The initial condition is at x = 0,y = yo, so a = yo- Equate coefficients of powers of x as high as is possible (here 
through x 1 ). 

fi = f(0,a) 2'y = D 1 f(0,a)+fiD 2 f(0,a). 
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(If you set 7 = 0, this is Euler's method.) 

y(h ) = y 0 + hf ( 0 , 2/ 0 ) + y |A/(0,t/o) + f{0,y 0 )D 2 f{0,y 0 )\. (11.35) 

The next problem is to evaluate these derivatives. If you can easily do them analytically, you can choose to do 
that. Otherwise, since they appear in a term that is multiplied by h 2 , it is enough to use the simplest approximation for 
the numerical derivative, 

DJ{0,y 0 )= [f(h,y o )-f(0,y 0 )]/h (11.36) 

You cannot expect to use the same interval, h, for the y variable — it might not even have the same dimensions, 

£> 2 /( 0 , y 0 ) = [f(j, yo + k)- f(j , y 0 )] /k. (11.37) 

where j and k are the order of h. Note that because this term appears in an expression multiplied by h 2 , it doesn't 
matter what j is. You can choose it for convenience. Possible values for these are 


( 1 ) j = 0 k = hf ( 0 , yo) 

( 2 ) j - 0 k = hf(h,y 0 ) 

The third of these choices for example gives 


(3) j = h 

(4) j = h 


h 2 

y = yo + hf{0,y 0 ) + — 


h [/(^» Vo) - /(0, yo)] + /( o, yo) 


k = hf ( 0 , yo) 
k = hf(h,y 0 ). 


f (h, yo + k) — f (h, y Q ) 


hf ( 0 , yo) 


= yo + 7 /( 0 , y 0 ) + ^f(h, y 0 + hf( 0 , y 0 )) 


(11.38) 


This procedure, a second order Runge-Kutta method, is a moderately accurate method for advancing from one point to 
the next in the solution of a differential equation. It requires evaluating the function twice for each step of the iteration. 


Let h=0.1 


tan x 


Example: y' = 1 + y 2 y( 0) = 0. 

x 

0 . 

0.1 

0.2 

0.3 

0.4 

0.5 


y(11.32) 

y(11.38) 

y(11.41) 

0 . 

0 . 

0 . 

0.10 

0.10050 

0.10025 

0.20100 

0.20304 

0.20252 

0.30504 

0.30981 

0.30900 

0.41435 

0.42341 

0.42224 

0.53151 

0.54702 

0.54539 


0 . 

0.10053 

0.20271 

0.30934 

0.42279 

0.54630 (11.39) 
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The fractional error at x = 0.5 with the second order method of equation (11.38) is 0.13% and with Euler it is —2.7% 
(first column). The method in Eq. (11.41) has error —0.17%. The fourth order version of Eq. (11.42) has an error 
—3.3 x 10~ 7 , more places than are shown in the table. 

The equation (11.38) is essentially a trapezoidal integration, like Eq. (11.15) (d). You would like to evaluate the 
function y at x = h by 

rh 

y(h) = y 0 + dxf(x,y(x )) (11.40) 

J o 

A trapezoidal integration would be 


y(h) ~ 2/o + |[/ (0,t/(0)) +f{h,y{h))] 

BUT, you don’t know the y(h) that you need to evaluate the last term, so you estimate it by using the approximate 
result that you would have gotten from the simple Euler method, Eq. (11.32). The result is exactly equation (11.38). 

If you can interpret this Runge-Kutta method as just an application of trapezoidal integration, why not use one 
of the other integration methods, such as the midpoint rule, Eq. (11.15) (c)? This says that you would estimate the 
integral in Eq. (11.40) by estimating the value of / at the midpoint 0 < x < h. 

y(h) ~ y 0 + A/(|,t/(|)) ~ Z/o + hf(^,y 0 + |/(0,t/ o )) (11-41) 

This is the same order of accuracy as the preceding method, and it takes the same number of function evaluations as 
that one (two). 

A commonly used version of this is the fourth order Runge-Kutta method: 

V = yo + g [^i + 2 /c2 + 2&3 + k/^ (11.42) 


h = hf (0, t/o) k 2 = hf(h/2,y 0 + k 1 /2 ) 

k 3 = hf(h/2,y 0 + k 2 /2) k A = hf (h, y 0 + k 3 ) . 


You can look up a fancier version of this called the Runge-Kutta-Fehlberg method. It’s one of the better techniques 
around. 
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Higher Order Equations 

How can you use either the Euler or the Runge-Kutta method to solve a second order differential equation? Answer: 
Turn it into a pair of first order equations. 

y" = f(x,y,y') — >y'=v, and v 1 = f[x, y, v) (11.43) 


The Euler method, Eq. (11.32) becomes 

y(x 0 + h) = y(x 0 ) + hv(x 0 ), and v(x 0 + h) = v(x 0 ) + hf(x 0 ,y(x 0 ),v(x 0 )) 

The construction for Runge-Kutta is essentially the same. 

Adams Methods 

The Runge-Kutta algorithm has the advantage that it is self-starting; it requires only the initial condition to go on 
to the next step. It has the disadvantage that it is inefficient. In going from one step to the next, it ignores all the 
information available from any previous steps. The opposite approach leads to the Adams methods, though these are 
not as commonly used any more. I’m going to develop a little of the subject mostly to show that the methods that I’ve 
used so far can lead to disaster if you're not careful. 

Shift the origin to be the point at which you want the new value of y. Assume that you already know y at —h, 
—2 h, . . . , —Nh. Because of the differential equation y’ = f(x,y), you also know y’ at these points. 

Assume 

N N 

1/(0) = ^2°i k y{-kh) + J2Pky'(- kh ) (11.44) 

l i 

With 2N parameters, you can get this accurate to order 1 , 


y(-kh) = ^(-kh) n 
o 


y {n) ( Q) 

n\ 


Substitute this into the equation for y( 0): 


y(o) = E^E(-“) nM ^r 1 + '‘EAE(-^r ! ' < “ +1,(0) 

k= 1 n = 0 ' k = 1 n=0 


n\ 
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This should be an identity to as high an order as possible. The coefficient of h° gives 

N 

i = J>* ( n - 45 ) 

k= 1 

The next orders are 

0 = J2 a k(~ kh ) + 

k k 

0 = E 7) a k(~kh) 2 + h Pk(~kh) 

k k 

: (11.46) 

N = 1 is Euler's method again. 

N = 2 gives 

OL\ + Ol2 = 1 Oil T 2ct2 = Pi + /3 2 

«i T 4ct2 = 2 (/?i + 2/32) cti + 8cr 2 = 3 (/3i + 4 ^ 2 ) 

The solution of these equations is 

Oil = — 4 ct2 = +5 /3i = +4 f3 2 = +2 

2/(0) = -4j/(-/i) + 5y(—2h) + h[4y'(-h) + 2y'(-2h)\ (11.47) 

To start this algorithm off, you need two pieces of information: the values of y at —h and at —2 h. This is in contrast 
to Runge-Kutta, which needs only one point. 

Example: Solve y' = y y( 0) = 1 (h = 0.1) 

I could use Runge-Kutta to start and then switch to Adams as soon as possible. For the purpose of this example, I’ll 
just take the exact value of y at x = 0.1. 

e 1 = 1.105170918 

y{.2) = — 4t/(.l) + 5t/(0) + .1 [4/ (• 1, 2/(- 1)) + 2/(0, 2/(0))] 

= -%(•!) + 52/(0) + .4y(.l) + .2y(0) 

= -3.6j/(.1) + 5.2j/(0) 

= 1.221384695 
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The exact value is e' 2 = 1.221402758; the first error is in the underlined term. Continuing the calculation to higher 
values of x, 

x y 

.3 1.3499038 

.4 1.491547 

.5 1.648931 

.6 1.81988 

.7 2.0228 

.8 2.1812 

.9 2.666 

1.0 1.74 

1.1 7.59 

1.2 -18.26 

1.3 105.22 0. .5 1. 

Everything is going very smoothly for a while, though the error is creeping up. At around x = 1, the numerical 
solution goes into wild oscillation and is completely unstable. The reason for this is in the coefficients —4 and +5 of 
y(—h) and y(—2h). Small errors are magnified by these large factors. (The coefficients of y' are not any trouble because 
of the factor h in front.) 

Instability 

You can compute the growth of this error explicitly in this simple example. The equation (11.47) together with y' = y is 

1/(0) = —S.6y(—h) + 5.2y(—2h), 

or in terms of an index notation 

yn = — 3.6?/ n _i + 5.2y n —2 

This is a linear, constant coefficient, difference equation, and the method for solving it is essentially the same as for a 
linear differential equation — assume an exponential form y n = k n . 

k n = — 3.6A4 1-1 + 5.2k n ~ 2 
k 2 + 3.6k - 5.2 = 0 
A; = 1.11 and -4.71 

Just as with the differential equation, the general solution is a linear combination of these two functions of n: 

y n = A(l.ll) n + B(-4.71) n , 
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where A and B are determined by two conditions, typically specifying y\ and t/ 2 . If B = 0, then y n is proportional to 
1.1 1 71 and it is the well behaved exponential solution that you expect. If, however, there is even a little bit of B present 
(perhaps because of roundoff errors), that term will eventually dominate and cause the large oscillations. If B is as small 
as 10~ 6 , then when n = 9 the unwanted term is greater than 1. 

When I worked out the coefficients in Eq. (11.47) the manipulations didn’t look all that different from those 
leading to numerical derivatives or integrals, but the result was useless. This is a warning. You're in treacherous territory 
here; tread cautiously. 

Are Adams-type methods useless? No, but you have to modify the development in order to get a stable algorithm. 
The difficulty in assuming the form 


N N 

y (°) = XI a kV (~ kh ) + X PkV'i-kh) 

1 1 

is that the coefficients are too large. To cure this, you can give up some of the 2 N degrees of freedom that the 
method started with, and pick the a priori to avoid instability. There are two common ways to do this, consistent 
with the constraint that must be kept on the a' s, 


N 

k = 1 

One way is to pick all the to equal l/N. Another way is to pick oq = 1 and all the others = 0, and both of these 
methods are numerically stable. The book by Lanczos in the bibliography goes into these techniques, and there are 
tabulations of these and other methods in Abramowitz and Stegun. 

Backwards Iteration 

Before leaving the subject, there is one more kind of instability that you can encounter. If you try to solve y" = +y with 
2/(0) = 1 and ?/(0) = —1, the solution is e~ x . If you use any stable numerical algorithm to solve this problem, it will soon 
deviate arbitrarily far from the desired one. The reason is that the general solution of this equation is y = Ae x + Be~ x . 
Any numerical method will, through rounding errors, generate a little bit of the undesired solution, e +x . Eventually, this 
must overwhelm the correct solution. No algorithm, no matter how stable, can get around this. 

There is a clever trick that sometimes works in cases like this: backwards iteration. Instead of going from zero 
up, start at some large value of x and iterate downward. In this direction it is the desired solution, e~ x , that is unstable, 
and the e +x is damped out. Pick an arbitrary value, say x = 10, and assign an arbitrary value to 2/(10) , say 0. Next, 
pick an arbitrary value for y'( 10), say 1. Use these as initial conditions (terminal conditions?) and solve the differential 
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equation moving left; necessarily the dominant term will be the unstable one, e~ x , and independent of the choice of 
initial conditions, it will be the solution. At the end it is only necessary to multiply all the terms by a scale factor to 
reduce the value at x = 0 to the desired one; automatically, the value of y'( 0) will be correct. What you are really 
doing by this method is to replace the initial value problem by a two point boundary value problem. You require that 
the function approach zero for large x. 

11.6 Fitting of Data 

If you have a set of data in the form of independent and dependent variables {xi, yi} (i = 1, . . . , N), and you have 
proposed a model that this data is to be represented by a linear combination of some set of functions, f^{x) 


t 


• . 

• • • 

|».V • * 


M 

y = '52 a nM x ), 

/i=i 


(11.48) 


what values of a ^ will represent the observations in the best way? There are several answers to this question depending 
on the meaning of the word “best." The most commonly used one, largely because of its simplicity, is Gauss’s method 
of least squares. 

Here there are N data and there are M functions that I will use to fit the data. You have to pick the functions 
for yourself. You can choose them because they are the implications of a theoretical calculation; you can choose them 
because they are simple; you can choose them because your daily horoscope suggested them. The sum of functions, 
now depends on only the M parameters a The / s are fixed. The difference between this sum and the data 
points y<i is what you want to be as small as possible. You can't use the differences themselves because they will as likely 
be negative as positive. The least squares method uses the sum of the squares of the differences between your sum of 
functions and the data. This criterion for best fit is that the sum 


N 


E 


M 


1 2 


Vi ^ y 
M=1 


Na 2 


(11.49) 


be a minimum. The mean square deviation of the theory from the experiment is to be least. This quantity a 2 is called 
the variance. 
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Some observations to make here: N > M, for otherwise there are more free parameters than data to fit them, 
and almost any theory with enough parameters can be forced to fit any data. Also, the functions must be linearly 
independent; if not, then you can throw away some and not alter the result — the solution is not unique. A further 
point: there is no requirement that all of the x,; are different; you may have repeated the measurements at some points. 
Minimizing this is now a problem in ordinary calculus with M variables. 

Q N " M ' 2 r 

fa- =~ 2 J2 Vi~J2 a »f^ x i) M x i) = 0 

v i= 1 £t= 1 i L A 4 

rearrange: Mx^f^Xi) = 'Y^y i f v (x i ) (11.50) 

M L i J i 

These linear equations are easily expressed in terms of matrices. 


Ca = b, 

where 

N 

c vl i = J2u( x i)f»( x i) 

i = 1 

a is the column matrix with components a ^ and b has components Y^iVifv{ x i)- 
The solution for a is 

a = C~ l b. 

If C turned out singular, so this inversion is impossible, the functions /), were not independent. 
Example: Fit to a straight line 

/i(T) = l f2(x)~x 

Then Ca = b is 

z2 x iJ V“2 ) \Lyi x i) 

The inverse is 

i ( E x i -EtA ( Ek A 

\a 2 ) [nzx?- (E^) 2 ] \~^ Xi N ) 

and the best fit line is 

y = cti + a 2 x 


(11.51) 

(11.52) 

(11.53) 
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11.7 Euclidean Fit 

In fitting data to a combination of functions, the least squares method used Eq. (11.49) as a measure of how far the 
proposed function is from the data. If you’re fitting data to a straight line (or plane if you have more variables) there's 
another way to picture the distance. Instead of measuring the distance from a point to the curve vertically using only y, 
measure it as the perpendicular distance to the line. Why should this be any better? It's not, but it does have different 
uses, and a primary one is data compression. 



Do this in two dimensions, fitting the given data to a straight line, and to describe the line I'll use vector notation, 
where the line is u + av and the parameter a varies over the reals. First I need to answer the simple question: what is 
the distance from a point to a line? The perpendicular distance from w to this line requires that 

d 2 = (w — u — av ) 2 


be a minimum. Differentiate this with respect to a and you have 


[w — u — av) ■ ( — v ) = 0 implying av 2 = (w — u)-v 


For this value of a what is r/ 2 ? 


d 2 = (w - u) 2 + a 2 v 2 




2av ■ (w — 
- u ) ■ v\ 2 


u) 


(11.54) 


Is this plausible? (1) It's independent of the size of v, depending on its direction only. (2) It depends on only the 
difference vector between w and u, not on any other aspect of the vectors. (3) If I add any multiple of v to u, the result 
is unchanged. See problem 11.37. Also, can you find an easier way to get the result? Perhaps one that simply requires 
some geometric insight? 
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The data that I'm trying to fit will be described by a set of vectors Wj, and the sum of the distances squared to 
the line is 

N N 

D ' 2 = Y J {w i -uf ^ 


-u)-v\ 


Now to minimize this among all u and v I'll first take advantage of some of the observations from the preceding 
paragraph. Because the magnitude of v does not matter, I'll make it a unit vector. 


B2 = E 


Wj — u) — 


EK 


Wj — u )■ v\ 


(11.55) 


Now to figure out u: Note that I expect the best fit line to go somewhere through the middle of the set of data points, 
so move the origin to the “center of mass” of the points. 

tt^mean — ^ ( w jj N and let 'w i — tt^iyiean and u — u w mean 

then the sum ^2w,' = 0 and 

D 2 = vjf + Nu' 2 - ■ v ) 2 - N(u' ■ v ) 2 (11.56) 

This depends on four variables, u' x , u! y , v x and v y . If I have to do derivatives with respect to all of them, so be it, 

but maybe some geometric insight will simplify the calculation. I can still add any multiple off) to u without changing 
this expression. That means that for a given v the derivative of D 2 as u ’ changes in that particular direction is zero. 
It's only as u’ changes perpendicular to the direction of v that D 2 changes. The second and fourth term involve 

u' 2 — ( u 1 ■ v ) 2 = u ,2 ( 1 — cos 2 9) = u 12 sin 2 6, where this angle 9 is the angle between u' and v. This is the perpendicular 

distance to the line (squared). Call it u'j_ = u' sin9. 


D 2 = ^ w' 2 - y~](w/ ■ v ) 2 + Nu' 2 - N(u' ■ v ) 2 = ^ w' 2 - yy (w/ ■ v ) 2 + Nu r ]_ 

The minimum of this obviously occurs for u'_ j_ = 0. Also, because the component of u! along the direction of v is 
arbitrary, I may as well take it to be zero. That makes u! = 0. Remember now that this is for the shifted w' data. For 
the original Wj data, u is shifted to u — tT me an- 




(11.57) 
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I'm not done. What is the direction of vl That is, I have to find the minimum of D 2 subject to the constraint 
that |D| = 1. Use Lagrange multipliers (section 8.12). 


= vl + vl -1 = 0 


Minimize D 2 = ^ w' 2 — ^(iT/ ■ v ) 2 subject to 
The independent variables are v x and v y , and the problem becomes 

V(D 2 + A0) = 0, with 0 = 0 

Differentiate with respect to the independent variables and you have linear equations for v x and v y , 


_d_ 

dv x 


( w 'xi Vx + w 'yi v y) 2 + X2Vx = 0 


or 


2 ( W 'xi Vx + w 'yi v v) w xi + X2v x = 0 
^2 2 (w' xi v x + w' yi v y )w yi + A2 Vy = 0 


(11.58) 


Correlation, Principal Components 

The correlation matrix of this data is 


(r\-L( T.W X i W yi\ 

1 j N \52 W yi W xi £< J 

The equations (11.58) are 

f C xx C xy \ ( v x A \/ f v x A 

V Cyx Cyy J \V y J \ V y ) 


(11.59) 


where A' = X/N. This is a traditional eigenvector equation, and there is a non-zero solution only if the determinant of 
the coefficients equals zero. Which eigenvalue to pick? There are two of them, and one will give the best fit while the 
other gives the worst fit. Just because the first derivative is zero doesn't mean you have a minimum of D 2 \ it could be 
a maximum or a saddle. Here the answer is that you pick the largest eigenvalue. You can see why this is plausible by 
looking at the special case for which all the data lie along the x-axis, then C xx > 0 and all the other components of the 
matrix = 0. The eigenvalues are C xx and zero, and the corresponding eigenvectors are x and y respectively. Clearly the 
best fit corresponds to the former, and the best fit line is the x-axis. The general form of the best fit line is (now using 
the original coordinate system for the data) 



— CtV T tt^mean 
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and this v is the eigenvector having the largest eigenvalue. More generally, look at Eq. (11.57) and you see that the lone 
negative term is biggest if the w s are in the same direction (or opposite) as v. 



This establishes the best fit to the line in the Euclidean sense. What good is it? It leads into the subject of 
Principal Component Analysis and of Data Reduction. The basic idea of this scheme is that if this fit is a good one, 
and the original points lie fairly close to the line that I’ve found, I can replace the original data with the points on this 
line. The nine points in this figure require 9 x 2 = 18 coordinates to describe their positions. The nine points that 
approximate the data, but that lie on the line and are closest to the original points require 9x1 = 9 coordinates along 
this line. Of course you have some overhead in the data storage because you need to know the line. That takes three 
more data (u and the angle off)), so the total data storage is 12 numbers. See problem 11.38 

This doesn't look like much of a saving, but if you have 10 6 points you go from 2 000 000 numbers to 1000 003 
numbers, and that starts to be significant. Remember too that this is a two dimensional problem, with two numbers for 
each point. With more coordinates you will sometimes achieve far greater savings. You can easily establish the equation 
to solve for the values of a for each point, problem 11.38. The result is 

on = - u) ■ v 


11.8 Differentiating noisy data 

Differentiation involves dividing a small number by another small number. Any errors in the numerator will be magnified 
by this process, and if you have to differentiate experimental data this will always present a difficulty. If it is data from 
the output of a Monte Carlo calculation the same problem will arise. 

Here is a method for differentiation that minimizes the sensitivity of the result to the errors in the input. Assume 
equally spaced data where each value of the dependent variable f{x) is a random variable with mean (/(x)) and variance 
a 2 . Follow the procedure for differentiating smooth data and expand in a power series. Let h = 2 k and obtain the 
derivative between data points. 

f(k) = /( 0) + kf'{ 0) + \k 2 f"(0) + ifc 3 /"'( o) + . . . 
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m - /(-*) = 2fc/'<o) + |fc 3 /'"(o) + . . . 

97 

/ (3fc) - /(-3fc) = 6kf'(0) + y fc 3 /"'(0) + • • • 

I’ll seek a formula of the form 

f'(0) = a[f(k) - n-k)\ +p[f(3k) - f(-3k)] (11.60) 

I am assuming that the variance of / at each point is the same, cr 2 , and that the fluctuations in / at different points 
are uncorrelated. The last statement is, for random variables /i and 

<(/i - </i ))(/2 - (/ 2 »> = o which expands to (/i/ 2 > = </i></ 2 > (11.61) 

Insert the preceding series expansions into Eq. (11.60) and match the coefficients of /'( 0). This gives an equation 
for a and (3: 

2ka + 6k/3 = l (11.62) 

One way to obtain another equation for a and (3 is to require that the k 3 f"( 0) term vanish; this leads back to the old 
formulas for differentiation, Eq. (11.11). Instead, require that the variance of /'( 0) be a minimum. 

<(/'(□) - </'(0)» 2 > = ([a(f(k) - </(*)» +a(f(-k) - (. f(-k ))) + • • -] 2 > 

= 2cr 2 a 2 + 2a 2 ft 2 (11.63) 


This comes from the fact that the correlation between say f(k) and /(— 3k) vanishes, and that all the individual variances 
are a 2 . That is, 


{(/«-</(*)»(/(-*) -</(-*)») = 0 


along with all the other cross terms. Problem: minimize 2cr 2 (a 2 + (3 2 ) subject to the constraint 2 ka + 6 k(3 
hardly necessary to resort to Lagrange multipliers for this problem. 

Eliminate ct: 


d_ 

d(3 




= 0 



+ 2/3 = 0 


/3 = 3/20 k, a = \j20k 


/(0) + / (h) + 3/ (2h) 


10 h 


1. It’s 


f'(.5h) 


3 f(~h) 


(11.64) 
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and the variance is 2a 2 (a 2 + j3 2 ) = a 2 / 5h 2 . In contrast, the formula for the variance in the standard four point 
differentiation formula Eq. (11.10), where the truncation error is least, is 65a 2 /72h 2 , which is 4.5 times larger. 

When the data is noisy, and most data is, this expression will give much better results for this derivative. Can you 
do even better? Of course. You can for example go to higher order and both decrease the truncation error and minimize 
the statistical error. See problem 11.22. 

11.9 Partial Differential Equations 

I'll illustrate the ideas involved here and the difficulties that occur in even the simplest example of a PDE, a first order 
constant coefficient equation in one space dimension 

du/dt + cdu/dx = ut + cu x = 0, (11.65) 

where the subscript denotes differentiation with respect to the respective variables. This is a very simple sort of wave 
equation. Given the initial condition that at t = 0, u(0,x) = f(x), you can easily check that the solution is 

u(t,x) = f{x — ct) (11.66) 

The simplest scheme to carry data forward in time from the initial values is a generalization of Euler’s method for 
ordinary differential equations 


u(t + At, x) = u(t, x) + ut(t, x)At 
= u(t, x) — u x (t, x)cAt 
c/\t 

= u(t, x) — —— [u(t, x + Ax) - u(t , x — Ax)] , 

Z L-A .1 

Here, to evaluate the derivative, I used the two point differentiation formula. 

• ••••• 


>T< 


(11.67) 


In this equation, the value of u at point (At,4Ax) depends on the values at (0, 3 Ax), (0, 4Ax), and (0,5 Ax). 
This diagram shows the scheme as a picture, with the horizontal axis being x and the vertical axis t. You march the 
values of u at the grid points forward in time (or backward) by a set of simple equations. 

The difficulties in this method are the usual errors, and more importantly, the instabilities that can occur. The 
errors due to the approximations involved can be classified in this case by how they manifest themselves on wavelike 
solutions. They can lead to dispersion or dissipation. 
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Analyze the dispersion first. Take as initial data u(t,x ) = Acoskx (or if you prefer, e ikx ). The exact solution 
will be Acos(kx — c ut) where u = ck. Now analyze the effect of the numerical scheme. If Ax is very small, using the 
discrete values of Ax in the iteration give an approximate equation 

u t = — — — [ u (t, x + Ax) — u(t, x — Ax)] 

Zi La 

A power series expansion in Ax gives, for the first two non-vanishing terms 


u t = -c 


tt® + (Ax) U X xx 


( 11 . 68 ) 


So, though I started off solving one equation, the numerical method more nearly represents quite a different equation. 
Try a solution of the form Acos(kx-Lut) in this equation and you get 


to = c 


k — ^(Ax) 2 /c 3 , 


(11.69) 


and you have dispersion of the wave. The velocity of the wave, u/k, depends on k and so it depends on its wavelength 
or frequency. 

The problem of instabilities is more conveniently analyzed by the use of an initial condition u{ 0, x) = e ikx , then 
Eq. (11.67) is 


u(At,x) 


_ ^ikx _ 


cAt 


= e 


ikx 


2Ax 
icAt 


e ik(x+Ax) _ e ik(x— Ax) 


Ax 


sin kAx 


(11.70) 


The n-fold iteration of this, therefore involves just the n th power of the bracketed expression; that's why the exponential 
form is easier to use in this case. If kAx is small, the first term in the expansion of the sine says that this is approximately 

e lkx [l — ikcAt] n , 

and with small At and n = t/ At a large number, this is 


g ikx 



ikct 


gik(x—ct) 


n 
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Looking more closely though, the object in brackets in Eq. (11.70) has magnitude 


r = 



c 2 (At) 2 

(Ax) 2 


1/2 


sin 2 kAx 


> 1 


(11.71) 


so the magnitude of the solution grows exponentially. This instability can be pictured as a kind of negative dissipation. 
This growth is reduced by requiring kcAt <C 1. 

Given a finite fixed time interval, is it possible to get there with arbitrary accuracy by making At small enough? 
With n steps = t/ At, r n is 


c 2 (At) 2 . 2 

r = 1 H — -A- sm k Ax 

(Ax) 2 

i/l a/3 
[l + cr] 1 /" 


t/2A.t 


= [1 + af 


^a/3 


= exp 


c 2 tAt 

2(Ax) 2 


sin 2 kAx 


so by shrinking At sufficiently, this is arbitrarily close to one. 

There are several methods to avoid some of these difficulties. One is the Lax-Friedrichs method: 

u(t + At, x) = ^ [u(t, x + Ax) + u(t, x — Ax)] - [ u(t , x + Ax) — u(t, x — Ax)] 


(11.72) 


By appropriate choice of At and Ax, this will have r < 1, causing a dissipation of the wave. Another scheme is the 
Lax-Wendroff method. 


u(t + At, x) = u(t, x) 


+ 


cAt 
2 Ax 
c 2 (At) 2 
2(Ax) 2 


[u(t, x + Ax) — u(t, x — Ax)] 

[u(t, x + Ax) — 2 u(t, x) + u(t, x — Ax)] 


(11.73) 


This keeps one more term in the power series expansion. 
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Exercises 

1 Use the four-point interpolation formula Eq. (11.3) to estimate e 3 / 4 from the values of e x at 0, 1/2, 1, 3/2. From 
the known value of the number, compute the relative error. 

2 Find a root of the equation cos a: = x. Start with a graph of course. 

3 Find the values of a and of x for which e x = ax has a single root. 

4 Find the roots of e x = ax when a is twice the value found in the preceding exercise, (and where is your graph?) 

5 Use (a) midpoint formula and (b) Simpson's rule, with two intervals in each case, to evaluate Aj^dxl/{\ +x 2 ). 

6 Use Newton's method to solve the equation sinx = 0, starting with the initial guess Xq = 3. 
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Problems 


11.1 Show that a two point extrapolation formula is 


/(0) ~ 2f(-h) - f(-2h) + h 2 f"(0). 


11.2 Show that a three point extrapolation formula is 


/( 0) « 3 f(-h) - 3f(—2h) + f{—3h) + h 3 f'( 0). 


11.3 Solve x 2 — a = 0 by Newton's method, showing graphically that in this case, no matter what the initial guess is 
(positive or negative), the sequence will always converge. Draw graphs. Find \/2. (This is the basis for the library square 
root algorithm on some computers.) 

11.4 Find all real roots of e~ x = sina; to ±10 -4 . Ans: 0.588533, n — 0.045166, 2n + 0.00187 . . . 

11.5 The first root r\ of e~ ax = sin a: is a function of the variable a > 0. Find dri/da at a = 1 by two means, (a) First 
find r\ for some values of a near 1 and use a four-point differentiation formula, (b) Second, use analytical techniques on 
the equation to solve for dr\/da and evaluate the derivative in terms of the known value of the root from the previous 
problem. 

11.6 Evaluate erf(l) = f^dte~ f2 Ans: 0.842736 (more exact: 0.842700792949715) 

11.7 The principal value of an integral is {a < x o < b ) 


P 


JW-dx = lim 

X — Xq e->0 



fix) 

X — Xq 


dx + 


[" JW-dx 

Jxo+t X Xq 


(a) Show that an equal spaced integration scheme to evaluate such an integral is (using points 0, ±h) 
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(b) Also, an integration scheme of the Gaussian type is 

V3[J(h/V 3) - f(-h/V3)} + fA/”(0). 


11.8 Devise a two point Gaussian integration with errors for the class of integrals 

f+° ° 2 

/ dxe x f{x). 


Find what standard polynomial has roots at the points where / is to be evaluated. 
Ans: i v / 7r[/(-l/v / 2) + /(l/v / 2)] 

11.9 Same as the preceding problem, but make it a three point method. 

Ans: v / tr[g/( - |\/6) + |/(°) + |/( + |v / 6)] 

11.10 Find one and two point Gauss methods for 



dxe X f{x). 


(a) Solve the one point method completely. 

(b) For the two point case, the two points are roots of the equation 1 — 2x + \ x 2 = 0. Use that as given to find the 
weights. Look up Laguerre polynomials. 

11.11 In n umerical differentiation it is possible to choose the interval too small. Every computation is done to a finite 
precision, (a) Do the simplest numerical differentiation of some specific function and take smaller and smaller intervals. 
What happens when the interval gets very small? (b) To analyze the reason for this behavior, assume that every number 
in the two point differentiation formula is kept to a fixed number of significant figures (perhaps 7 or 8). How does the 
error vary with the interval? What interval gives the most accurate answer? Compare this theoretical answer with the 
experimental value found in the first part of the problem. 

11.12 Just as in the preceding problem, the same phenomenon caused by roundoff errors occurs in integration. For 
any of the integration schemes discussed here, analyze the dependence on the number of significant figures kept and 
determine the most accurate interval. (Surprise?) 
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11.13 Compute the solution of y' = 1 +y 2 and check the numbers in the table where that example was given, Eq. (11.39). 


11.14 If in the least square fit to a linear combination of functions, the result is constrained to pass through one point, 
so that X] a fj.f'i-i( x o) = K is a requirement on the a’s, show that the result becomes 

a = C- 1 [b + Xf 0 \, 

where /q is the vector f^x o) and A satisfies 

\(fo,C- 1 f 0 )=K-(f 0 ,C- 1 b). 


11.15 Find the variances in the formulas Eq. (11.8) and (11.10) for f, assuming noisy data. 

Ans: a 2 /2h 2 , 65cr 2 / 72 h 2 

11.16 Derive Eqs. (11.61), (11.62), and (11.63). 

11.17 The Van der Pol equation arises in (among other places) nonlinear circuits and leads to self-exciting oscillations 
as in multi-vibrators 


g_ £(1 _^) i+ , = 0. 


2 .dec 


Take e = .3 and solve subject to any non-zero initial conditions. Solve over many periods to demonstrate the development 
of the oscillations. 

11.18 Find a formula for the numerical third derivative. Cf. Eq. (2.18) 

11.19 The equation resulting from the secant method, Eq. (11.7), can be simplified by placing everything over a common 
denominator, (f(x 2 ) — f(x 1 )). Explain why this is a bad thing to do, how it can lead to inaccuracies. 

11.20 Rederive the first Gauss integration formula Eq. (11.25) without assuming the symmetry of the result 

r+h 


—h 


f{x) dx 


<*f(P) + 7 /( 5 ). 


11.21 Derive the coefficients for the stable two-point Adams method. 
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11.22 By putting in one more parameter in the differentiation algorithm for noisy data, it is possible both to minimize 
the variance in f and to eliminate the error terms in h 2 f" . Find such a 6-point formula for the derivatives halfway 
between data points OR one for the derivatives at the data points (with errors and variance). 

Ans: /'( 0) = [58 {f(h) - f(-h )) + 67{f(2h) - f(-2h)) - 22(/(3/i) - f(-3h))] / (252h) 

11.23 In the same spirit as the method for differentiating noisy data, how do you interpolate noisy data? That is, use 
some extra points to stabilize the interpolation against random variations in the data. To be specific, do a midpoint 
interpolation for equally spaced points. Compare the variance here to that in Eq. (11.3). Ans: /( 0) ~ [/(— 3k) + 

f(—k ) + f(k) + f(3k)]/4, cr 2 is 4.8 times smaller 

11.24 Find the dispersion resulting from the use of a four point formula for u x in the numerical solution of the PDE 

u t + cu x = 0. 

11.25 Find the exact dispersion resulting from the equation 

ut = —c[u(t, x + Ax) — u(t, x — Ax)]/ 2 Ax. 

That is, don’t do the series expansion on Ax. 

11.26 Compute the dispersion and the dissipation in the Lax-Friedrichs and in the Lax-Wendroff methods. 

11.27 In the simple iteration method of Eq. (11.71), if the grid points are denoted x = mAx, t = nAt, where n and m 
are integers (— oo <n,m < +oo), the result is a linear, constant-coefficient, partial difference equation. Solve subject 
to the initial condition 

u(0,m) = e ikmAx . 

11.28 Lobatto integration is like Gaussian integration, except that you require the end-points of the interval to be 
included in the sum. The interior points are left free. Three point Lobatto is the same as Simpson; find the four point 
Lobatto formula. The points found are roots of P’ n _\. 

11.29 From the equation y' = f(x,y), one derives y" = f x + f f y . Derive a two point Adams type formula using the 
first and second derivatives, with error of order h 5 as for the standard four-point expression. This is useful when the 
analytic derivatives are easy. The form is 


y(0) = y(-h) + /3iy\-h) + /3 2 y'(-2h) + 71 y”(~h) + l2y"{-2 h) 
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Ans: /3i = -A/2, /3 2 = 3A/2, 71 = 17A 2 /l2, 72 = 7A 2 /l2 

11.30 Using the same idea as in the previous problem, find a differential equation solver in the spirit of the original Euler 
method, (11.32), but doing a parabolic extrapolation instead of a linear one. That is, start from ( Xo,yo ) and fit the 
initial data to y = a + /3(x — x 0) +7(2: — Xq) 2 in order to take a step. Ans: y(h) = yo + hf( 0, t/o) + (A 2 / 2) [f x {0, t/o) + 
fy(n,yo)f(o,yo )] 

11.31 Show that the root finding algorithm of Eq. (11.7) is valid for analytic functions of a complex variable with 
complex roots. 

11.32 In the Runge-Kutta method, pick one of the other choices to estimate /^/(O, t/o) in Eq. (11.37). How many 
function evaluations will it require at each step? 

11.33 Sometimes you want an integral where the data is known outside the domain of integration. Find an integration 
scheme for /| l f(x ) dx in terms of /(A), /( 0), and /(—A). Ans: [—/(—A) + 8/(0) + 5/(A)]A/l2, error oc A 4 

11.34 When you must subtract two quantities that are almost the same size, you can find yourself trying to carry 
ridiculously many significant figures in intermediate steps. If a and b are very close and you want to evaluate \fa — Vb, 
devise an algorithm that does not necessitate carrying square roots out to many more places than you want in the final 
answer. Write a = b + e. 

Ans: e/2 Vb, error: e 2 /8 5 3 / 2 

11.35 Repeat the preceding problem but in a more symmetric fashion. Write a = x + e and b = x — e. Compare the 
sizes of the truncation errors. Ans: e/y/x, — e 3 /8x 5 / 2 

11.36 The value of n was found in the notes by integrating 4/ (1 + x 2 ) from zero to one using Simpson’s rule and five 
points. Do the same calculation using Gaussian integration and two points. Ans: 3.14754 (Three points give 3.14107) 

11.37 (a) Derive Eq. (11.54). 

(b) Explain why the plausibility arguments that follow it actually say something. 

11.38 After you’ve done the Euclidean fit of data to a straight line and you want to do the data reduction described 
after Eq. (11.59), you have to find the coordinate along the line of the best fit to each point. This is essentially the 
problem: Given the line ( u and v) and a point ( w ), the new reduced coordinate is the a in u + av so that this point 
is closest to w. What is it? You can do this the hard way, with a lot of calculus and algebra, or you can draw a picture 
and write the answer down. 
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11.39 Data is given as (rcj, y{) = {(1, 1), (2, 2), (3, 2)}. Compute the Euclidean best fit line. Also find the coordinates, 
ctj, along this line and representing the reduced data set. 

Ans: u = (2, 5/3) v = (0.88167, 0.47186) cti = -1.1962 a 2 = 0.1573 a 3 = 1.0390 
The approximate points are (0.945,1.102), (2.139,1.741), (2.916,2.157) 

[It may not warrant this many significant figures, but it should make it easier to check your work.] 

11.40 In the paragraph immediately following Eq. (11.23) there's mention of an alternate way to derive Simpson's rule. 
Carry it out, though you already know the answer. 

11.41 (a) Derive the formula for the second derivative in terms of function values at three equally spaced points, (b) 
Use five points to get the second derivative, but using the extra data to minimize the sensitivity to noise in the data. 
Ans: (a) [f (-h) - 2/ (0) + / (h)] / h 2 (b) [ - 2/(0) - (f(h) + f(-h)) + 2(f(2h) + f(-2h))] /7h 2 . The ratio of the 
standard deviation for (b) is smaller than for (a) by a factor 21 1 / 2 = 4.6 


Tensors 


You can't walk across a room without using a tensor (the pressure tensor). You can’t align the wheels on your car 
without using a tensor (the inertia tensor). You definitely can't understand Einstein's theory of gravity without using 
tensors (many of them). 

This subject is often presented in the same language in which it was invented in the 1890's, expressing it in terms 
of transformations of coordinates and saturating it with such formidable-looking combinations as dx l / dx-> . This is really 
a sideshow to the subject, one that I will steer around, though a connection to this aspect appears in section 12.8. 

Some of this material overlaps that of chapter 7, but I will extend it in a different direction. The first examples 
will then be familiar. 


12.1 Examples 

A tensor is a particular type of function. Before presenting the definition, some examples will clarify what I mean. Start 
with a rotating rigid body, and compute its angular momentum. Pick an origin and assume that the body is made up 
of N point masses ?7tj at positions described by the vectors r) ( i = 1,2, . . . , N). The angular velocity vector is u. 
For each mass the angular momentum is r. \ x p. \ x (mjfTj). The velocity is given by uj x r) and so the angular 
momentum of the f th particle is rrijfi x (c 5x rj). The total angular momentum is therefore 



( 12 . 1 ) 


The angular momentum, L, will depend on the distribution of mass within the body and upon the angular velocity. Write 
this as 

L = I(u ), 

where the function / is called the tensor of inertia. 

For a second example, take a system consisting of a mass suspended by six springs. At equilibrium the springs are 
perpendicular to each other. If now a (small) force F is applied to the mass it will undergo a displacement d. Clearly, 
if F is along the direction of any of the springs then the displacement d will be in the same direction as F. Suppose 
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however that F is halfway between the k\ and k 2 springs, and further that the spring k 2 was taken from a railroad 
locomotive while k\ is a watch spring. Obviously in this case d will be mostly in the x direction ( k \ ) and is not aligned 
with F . In any case there is a relation between d and F, 

d = f(F). (12.2) 

The function / is a tensor. 

In both of these examples, the functions involved were vector valued functions of vector variables. They have the 
further property that they are linear functions, i.e. if a and (3 are real numbers, 

I (art 1 + Put) = cd( aJi) + 0/(w 2 )* f(aF \ + f3F 2 ) = af(F [) + f3f(F 2 ) 

These two properties are the first definition of a tensor. (A generalization will come later.) There’s a point here 
that will probably cause some confusion. Notice that in the equation L = 1(0), the tensor is the function /. I didn’t 
refer to “the function 1(0)" as you commonly see. The reason is that 1(0), which equals L, is a vector, not a tensor. 
It is the output of the function / after the independent variable 0 has been fed into it. For an analogy, retreat to the 
case of a real valued function of a real variable. In common language, you would look at the equation y = f(x) and 
say that f(x) is a function, but it's better to say that / is a function, and that f(x) is the single number obtained by 
feeding the number x to / in order to obtain the number f(x). In this language, / is regarded as containing a vast 
amount of information, all the relations between x and y. f(x) however is just a single number. Think of / as the 
whole graph of the function and f(x) as telling you one point on the graph. This apparently trivial distinction will often 
make no difference, but there are a number of cases (particularly here) where a misunderstanding of this point will cause 
confusion. 

Definition of “Function” 

This is a situation in which a very abstract definition of an idea will allow you to understand some fairly concrete 
applications far more easily. 

Let X and Y denote sets (possibly the same set) and x and y are elements of these sets (x e X, 

V e Y). Form a new set F consisting of some collection of ordered pairs of elements, one from X and 
one from Y. That is, a typical element of the set F is (x\, t/i) where x\ € X and y\ e Y. Such a set 
is called a “relation" between X and Y. 

If X is the set of real numbers and Y is the set of complex numbers, examples of relations are the sets 

Fi = {(1.0, 7.3 — 2.1i)j (— 7 r,e + i\/2~.), (3.2googol, 0. + 0.i), (1.0, e — in)} 

F 2 = {(x, z ) | z 2 = 1 - x 2 and - 2 < x < 1} 
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There are four elements in the first of these relations and an infinite number in the second. A relation is not necessarily a 
function, as you need one more restriction. To define a function, you need it to be single-valued. That is the requirement 
that if (x,yi) G F and (tc, 2/2) £ F then y\ = 2/2 - The ordinary notation for a function is y — F(x), and in the language 
of sets we say (x,y) G F. The set F is the function. You can picture it as a graph, containing all the information about 
the function; it is by definition single-valued. You can check that Fi above is a function and F 2 is not. 


r\ 


r\ 

h 

x 2 + y 2 = Rr*^- 

w 

r 

y = \J x 2 Y y 2 


For the real numbers x and y, x 2 + y 2 = R 2 defines a relation between X and Y, but y = \/R 2 — x 2 is a function. 
In the former case for each x in the interval —R < x < R you have two y's, ±\AR 2 — x 2 . In the latter case there is 
only one y for each x. The domain of a function is the set of elements x such that there is a y with (x,y) G F. The 
range is the set of y such that there is an x with (x, y) G F. For example, —R < x < R is the domain of y = \/R 2 — x 2 
and 0 < y < R is its range. 

Equation (12.1) defines a function /. The set X is the set of angular velocity vectors, and the set Y is the set of 
angular momentum vectors. For each of the former you have exactly one of the latter. The function is the set of all the 
pairs of input and output variables, so you can see why I don't want to call I(u) a function — it's a vector, L. 

Another physical example of a tensor is the polarizability tensor relating the electric dipole moment density vector 
P of matter to an applied electric field vector E: 

P = a{E) 

For the vacuum this is zero. More generally, for an isotropic linear medium, this function is nothing more than multipli- 
cation by a scalar, 

P = aE 

In a crystal however the two fields P and E are not in the same direction, though the relation between them is still 
linear for small fields. This is analogous to the case above with a particle attached to a set of springs. The electric field 
polarizes the crystal more easily in some directions than in others. 

The stress-strain relation in a crystal is a more complex situation that can also be described in terms of tensors. 
When a stress is applied, the crystal will distort slightly and this relation of strain to stress is, for small stress, a linear 
one. You will be able to use the notion of a tensor to describe what happens. In order to do this however it will be 
necessary to expand the notion of “tensor" to include a larger class of functions. This generalization will require some 
preliminary mathematics. 
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Functional 

Terminology: A functional is a real (scalar) valued function of one or more vector variables. In particular, a linear 
functional is a function of one vector variable satisfying the linearity requirement. 

f(odh + /3v 2 ) = af(v i) + Pf{y 2 ). (12.3) 

A simple example of such a functional is 

f{v) = A-v, (12.4) 

where A is a fixed vector. In fact, because of the existence of a scalar product, all linear functionals are of this form, a 
result that is embodied in the following theorem, the representation theorem for linear functionals in finite dimensions. 
Let / be a linear functional: that is, / is a scalar valued function of one vector variable and is linear 
in that variable, f(v ) is a real number and 

f(avi + (3v 2 ) = af(vi) + / 3f(v 2 ) then (12.5) 

there is a unique vector, A, such that f(v) = A-v for all v. 

Now obviously the function defined by A-v, where A is a fixed vector, is a linear. The burden of this theorem is 
that all linear functionals are of precisely this form. 

There are various approaches to proving this. The simplest is to write the vectors in components and to compute 
with those. There is also a more purely geometric method that avoids using components. The latter may be more 
satisfying, but it's harder. I'll pick the easy way. 

To show that f(v) can always be written as A-v, I have to construct the vector A. If this is to work it has to 
work for all vectors v, and that means that it has to work for every particular vector such as x, y, and f . It must be that 

f(x) = A-x and f(y)=A-y and f(z) = A-z 

The right side of these equations are just the three components of the vector A, so if this theorem is to be true the only 
way possible is that its components have the values 

A x = f(x) and A y = f(y) and A z = /(f) (12.6) 

Now to find if the vectors with these components does the job. 


f(v) = f(v x x + v y y + v z z) = Vx f(x) + Vyf(y)+v z f{z) 
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This is simply the property of linearity, Eq. (12.3). Now use the proposed values of the components from the preceding 
equation and this is exactly what’s needed: A x v x + A y v y + A z v z = A ■ v. 

Multilinear Functionals 

Functionals can be generalized to more than one variable. A bilinear functional is a scalar valued function of two vector 
variables, linear in each 

T(v i, v 2 ) = a scalar 

T(av 1 + fin 2 , v 3 ) = aT(vi , v 3 ) + /3T(y 2 , v 3 ) (12.7) 

T(v 1, av 2 + f3v 3 ) = aT(ffi, t 7 2 ) + /TT(ffi, ff 3 ) 

Similarly for multilinear functionals, with as many arguments as you want. 

Now apply the representation theorem for functionals to the subject of tensors. Start with a bilinear functional so 
that \T(v\ 1 v 2 ) is a scalar. This function of two variables can be looked on as a function of one variable by holding the 
other one temporarily fixed. Say v 2 is held fixed, then 2 T(v i,v 2 ) defines a linear functional on the variable V\ . Apply 
the representation theorem now and the result is 


lT{viiV 2 ) = Vi ■ A 

The vector A however will depend (linearly) on the choice of v 2 . It defines a new function that I’ll call \T 

A = \T(y 2 ) (12.8) 

This defines a tensor \T , a linear, vector-valued function of a vector. That is, starting from a bilinear functional 
you can construct a linear vector-valued function. The reverse of this statement is easy to see because if you start with 
\T(u ) you can define a new function of two variables 2 T(w , u) = w ■ \T{u ), and this is a bilinear functional, the same 
one you started with in fact. 

With this close association between the two concepts it is natural to extend the definition of a tensor to include 
bilinear functionals. To be precise, I used a different name for the vector-valued function of one vector variable ()T) 
and for the scalar-valued function of two vector variables ( 2 T). This is overly fussy, and it's common practice to use the 
same symbol (T) for both, with the hope that the context will make clear which one you actually mean. In fact it is so 
fussy that I will stop doing it. The rank of the tensor in either case is the sum of the number of vectors involved, two 
(= 1 + 1 = 0 + 2) in this case. 

The next extension of the definition follows naturally from the previous reformulation. A tensor of n th rank is 
an n- linear functional, or any one of the several types of functions that can be constructed from it by the preceding 
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argument. The meaning and significance of the last statement should become clear a little later. In order to clarify the 
meaning of this terminology, some physical examples are in order. The tensor of inertia was mentioned before: 

L = I(u). 

The dielectric tensor related D and E\ 

D = e(E) 

The conductivity tensor relates current to the electric field: 

3 = 

In general this is not just a scalar factor, and for the a.c. case cr is a function of frequency. 

The stress tensor in matter is defined as follows: If a body has forces on it (compression or 
twisting or the like) or even internal defects arising from its formation, one part of the body will exert 
a force on another part. This can be made precise by the following device: Imagine making a cut in 
the material, then because of the internal forces, the two parts will tend to move with respect to each 
other. Apply enough force to prevent this motion. Call it A F. Typically for small cuts A F will be 
proportional to the area of the cut. The area vector is perpendicular to the cut and of magnitude equal 
to the area. For small areas you have differential relation dF = S (dA ) . This function S is called the stress tensor 
or pressure tensor. If you did problem 8.11 you saw a two dimensional special case of this, though in that case it was 
isotropic, leading to a scalar for the stress (also called the tension). 

There is another second rank tensor called the strain tensor. I described it qualitatively in section 9.2 and I'll 
simply add here that it is a second rank tensor. When you apply stress to a solid body it will develop strain. This defines 
a function with a second rank tensor as input and a second rank tensor as output. It is the elasticity tensor and it has 
rank four. 

So far, all the physically defined tensors except elasticity have been vector-valued functions of vector variables, 
and I haven’t used the n-linear functional idea directly. However there is a very simple example of such a tensor: 

work = F ■ d 

This is a scalar valued function of the two vectors F and d. This is of course true for the scalar product of any two 
vectors a and b 

g(a,b ) = a-b (12-9) 

g is a bilinear functional called the metric tensor. There are many other physically defined tensors that you will encounter 
later. In addition I would like to emphasize that although the examples given here will be in three dimensions, the 
formalism developed will be applicable to any number of dimensions. 


A 



cut 
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12.2 Components 

Up to this point, all that I've done is to make some rather general statements about tensors and I've given no techniques 
for computing with them. That's the next step. I’ll eventually develop the complete apparatus for computation in an 
arbitrary basis, but for the moment it's a little simpler to start out with the more common orthonormal basis vectors, 

and even there I'll stay with rectangular coordinates for a while. (Recall that an orthonormal basis is an independent set 

of orthogonal unit vectors, such as x, y, z.) Some of this material was developed in chapter seven, but I'll duplicate 
some of it. Start off by examining a second rank tensor, viewed as a vector valued function 

u = T(y) 

The vector v can be written in terms of the three basis vectors line x, y, z. Or, as I shall denote them e\, e 2 , e 3 where 

|ei| = \e 2 \ = | €3 1 = 1, and ei-e2 = 0 etc. (12.10) 

In terms of these independent vectors, v has components V\, V2, v 3 : 

v = Vi ei + v 2 e 2 + v 3 e 3 (12.11) 

The vector u = T(v) can also be expanded in the same way: 

u = ui§i +u 2 e 2 + u 3 e 3 (12.12) 

Look at T(v ) more closely in terms of the components 

T(v) = T(v iei +v 2 e 2 + v 3 e 3 ) 

= vi T(e 1) + v 2 T(e 2 ) + v 3 T(e 3 ) 

(by linearity). Each of the three objects T(e 1), T(e 2), T(e 3) is a vector, which means that you can expand each one 
in terms of the original unit vectors 

T(e 1) = Tn ei + T 21 e 2 + T 31 e 3 

T(e 2 ) = Ti 2 ei + T 22 e 2 + T 32 e 3 or more compactly, T(ej) = y^T^e ? - (12.13) 

T(e 3 ) = Ti 3 £i +T 23 e 2 + T 33 e 3 i 
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The numbers Tjj [i,j = 1, 2, 3) are called the components of the tensor in the given basis. These numbers will depend 
on the basis chosen, just as do the numbers V{, the components of the vector v. The ordering of the indices has been 
chosen for later convenience, with the sum on the first index of the X)j. This equation is the fundamental equation from 
which everything else is derived. (It will be modified when non-orthonormal bases are introduced later.) 

Now, take these expressions for T(ej) and plug them back into the equation u = T(v)\ 

ui ei + u 2 e 2 + u 3 e 3 = T(v) = v x [T n e i + T 2] e 2 + T 31 e 3 ] 

+v 2 [Ti 2 ei + T22 e 2 + T32 £3] 

+v 3 [Ti 3 ei + T 23 e 2 + T 33 e 3 ] 

= [T\\V\ + T 12 v 2 + T 13 v 3 \ ei 

+ [T 2 it'i + T 22 V 2 + ^23^3] e 2 

+ [T 3lVl + T 32 v 2 + T 33 t> 3 ] e 3 

Comparing the coefficients of the unit vectors, you get the relations among the components 


Mi = T n v\ + T 12 v 2 + T 13 v 3 
u 2 = T 2 iV\ + T 22 v 2 + T 23 v 3 
u 3 = T 3 iv 1 + T 32 v 2 + T 33 v 3 


More compactly: 


3 

u i = ^2 TijVj 

3 = 1 


or 



(12.14) 


(12.15) 


At this point it is convenient to use the summation convention (first* version). This convention says that if a 
given term contains a repeated index, then a summation over all the possible values of that index is understood. With 
this convention, the previous equation is 

u i = TijVj. (12.16) 

Notice how the previous choice of indices has led to the conventional result, with the first index denoting the row and 
the second the column of a matrix. 


* See section 12.5 for the later modification and generalization. 
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Now to take an example and tear it apart. Define a tensor by the equations 

T(x)=x + y, T(y) = y, (12.17) 

where x and y are given orthogonal unit vectors. These two expressions, combined with linearity, suffice to determine 
the effect of the tensor on all linear combinations of x and y. (This is a two dimensional problem.) 

To compute the components of the tensor pick a set of basis vectors. The obvious ones in this instance are 

e\ = x, and §2 = y 

By comparison with Eq. (12.13), you can read off the components of T. 


Tn = 1 T 2 i = 1 

T\2 = 0 T 22 = 1 - 

Write these in the form of a matrix as in Eq. (12.15) 



and writing the vector components in the same way, the components of the vectors x and y are respectively 



The original equations (12.17), that defined the tensor become the components 
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12.3 Relations between Tensors 

Go back to the fundamental representation theorem for linear functionals and see what it looks like in component form. 
Evaluate f(v), where v = v-i e*. (The linear functional has one vector argument and a scalar output.) 

f(v) = f(v i e i )=v i f{e i ) (12.18) 

Denote the set of numbers f{e{) ( i = 1, 2, 3) by A j = /(ej), in which case, 

f(v ) = AjVi = Aivi + A 2 v 2 + A 3 v 3 

Now it is clear that the vector A of the theorem is just 

A = Aiii A2§2 ~h A 3 e 3 (12.19) 

Again, examine the problem of starting from a bilinear functional and splitting off one of the two arguments in 
order to obtain a vector valued function of a vector. I want to say 

T(u, v) = u-T(v) 

for all vectors u and v. You should see that using the same symbol, T, for both functions doesn't cause any trouble. 
Given the bilinear functional, what is the explicit form for T(v)l The answer is most readily found by a bit of trial and 
error until you reach the following result: 

T(v) = iiT(e h v) (12.20) 

(Remember the summation convention.) To verify this relation, multiply by an arbitrary vector, u = Ujiy. 

u-T(v) = (uj ij) ■ iiT^ii, v ) 

which is, by the orthonormality of the e’s, 

UjSjiT^i, v) = UiT(e h v) = T(u, v) 

This says that the above expression is in fact the correct one. Notice also the similarity between this construction and 
the one in equation (12.19) for A. 
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Now take T(v) from Eq. (12.20) and express v in terms of its components 

v = Vj e.j , then T(v) = eiT(ei,Vj£j) = eiT(§i,ej)vj 

The i component of this expression is 

T{£i, 6j)vj = TijVj 

a result already obtained in Eq. (12.16). 

There's a curiosity involved here; why should the left hand entry in T( , ) be singled out to construct 

e t T{ei,vy! 

Why not use the right hand one instead? Answer: No reason at all. It’s easy enough to find out what happens when 
you do this. Examine 

£iT(v, e;) = T(v) (12.21) 

Put v = Vj e,j , and you get 

e,; T (vj e,j , e-i) = T (e,j , £i)vj 

The f th component of which is 

TjiVj 

If you write this as a square matrix times a column matrix, the only difference between this result and that of Eq. (12.16) 
is that the matrix is transposed. This vector valued function T is called the transpose of the tensor T. The nomenclature 
comes from the fact that in the matrix representation, the matrix of one equals the transpose of the other's matrix. 

By an extension of the language, this applies to the other form of the tensor, T : 

T(u, v) = T(v, u) (12.22) 


Symmetries 

Two of the common and important classifications of matrices, symmetric and antisymmetric, have their reflections in 
tensors. A symmetric tensor is one that equals its transpose and an antisymmetric tensor is one that is the negative of 
its transpose. It is easiest to see the significance of this when the tensor is written in the bilinear functional form: 


Tij = T(£i, £j) 
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This matrix will equal its transpose if and only if 


T(u, v) = T(v, u) 


for all u and v. Similarly, if for all u and v 

T(u , v) = —T(v, u) 

then T = —T. Notice that it doesn't matter whether I speak of T as a scalar-valued function of two variables or as a 
vector-valued function of one; the symmetry properties are the same. 

From these definitions, it is possible to take an arbitrary tensor and break it up into its symmetric part and its 
antisymmetric part: 


T 

T s {u, v) 
T a {u, v) 


\{T + f) +\{T -T) =T s + T a 
\ T(u,v) + T(v,u) 
l\T(u,v)-T(v,u) ] 


(12.23) 


Many of the common tensors such as the tensor of inertia and the dielectric tensor are symmetric. The magnetic 
field tensor in contrast, is antisymmetric. The basis of this symmetry in the case of the dielectric tensor is in the relation 
for the energy density in an electric field, J E-dD* Apply an electric field in the x direction, then follow it by adding 
a field in the y direction; undo the field in the x direction and then undo the field in the y direction. The condition that 
the energy density returns to zero is the condition that the dielectric tensor is symmetric. 

All of the above discussions concerning the symmetry properties of tensors were phrased in terms of second rank 
tensors. The extensions to tensors of higher rank are quite easy. For example in the case of a third rank tensor viewed 
as a 3-1 i near functional, it would be called completely symmetric if 


T[u , v, w) = T(v, u, w) = T(u , w, v) = etc. 

* This can be proved by considering the energy in a plane parallel plate capacitor, which is, by definition of potential, 
J V dq. The Potential difference V is the magnitude of the E field times the distance between the capacitor plates. 
[V = Ed.] {E is perpendicular to the plates by V x E = 0.) The normal component of D related to q by V ■ D = p. 
[AD ■ h = q.] Combining these, and dividing by the volume gives the energy density as f E-dD. 
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for all permutations of u, v, w, and for all values of these vectors. Similarly, if any interchange of two arguments changed 
the value by a sign, 

T(u, v, w) = —T(v, u, w) = +T(v, w, u) = etc. 

then the T is completely antisymmetric. It is possible to have a mixed symmetry, where there is for example symmetry 
on interchange of the arguments in the first and second place and antisymmetry between the second and third. 


Alternating Tensor 

A curious (and very useful) result about antisymmetric tensors is that in three dimensions there is, up to a factor, exactly 
one totally antisymmetric third rank tensor; it is called the "alternating tensor.” So, if you take any two such tensors, A 
and A 7 , then one must be a multiple of the other. (The same holds true for the n th rank totally antisymmetric tensor in 
n dimensions.) 

Proof: Consider the function A — crA 7 where cr is a scalar. Pick any three independent vectors Vio, V20, U30 as 
long as A 7 on this set is non-zero. Let 


A(#iq, V20, V30) 
A 7 (ui 0 , V20, V30) 


(12.24) 


(If A 7 gives zero for every set of us then it’s a trivial tensor, zero.) This choice of a guarantees that A — ctA 7 will vanish 
for at least one set of values of the arguments. Now take a general set of three vectors V\, V2, and U 3 and ask for the 
effect of A — crA 7 on them. V\ , V2, and V3 can be expressed as linear combinations of the original U10, U20, and U30. Do 
so. Substitute into A — crA 7 , use linearity and notice that all possible terms give zero. 

The above argument is unchanged in a higher number of dimensions. It is also easy to see that you cannot have 
a totally antisymmetric tensor of rank n + 1 in n dimensions. In this case, one of the n + 1 variables would have to be a 
linear combination of the other n. Use linearity, and note that when any two variables equal each other, antisymmetry 
forces the result to be zero. These observations imply that the function must vanish identically. See also problem 12.17. 
If this sounds familiar, look back at section 7.7. 


12.4 Birefringence 

It’s time to stop and present a serious calculation using tensors, one that leads to an interesting and not at all obvious 
result. This development assumes that you've studied Maxwell’s electromagnetic field equations and are comfortable 
with vector calculus. If not then you will find this section obscure, maybe best left for another day. The particular 
problem that I will examine is how light passes through a transparent material, especially a crystal that has different 
electrical properties in different directions. The common and best known example of such a crystal is Iceland spar, a 
form of calcite (CaCOs). 


Vx£ = - 


Vx 5 


- dE 
Ao 3 + 


-* dP 
3 ~~dt 


dB 

~dt 


P = a(E) 


(12.25) 
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E and B are the electric and magnetic fields. P is the polarization of the medium, the electric dipole moment density. 
a is the polarizability tensor, relating the amount of distortion of the charges in the matter to the applied electric field. 
The current density j appears because a time-varying electric field will cause a time-varying movement of the charges in 
the surrounding medium — that’s a current. 

Take the curl of the first equation and the time derivative of the second. 


VxVxfi 


d_ 

dt 


V x B 


dB _ dj d 2 E 

+ m ° ~W 


The two expressions involving B are the same, so eliminate B. 

„ „ - dj d 2 E d 2 a(E) d 2 E 

V X V X E fl o MO Q-j-2 Qj_2 

I make the assumption that a is time independent, so this is 

^ / d 2 E\ d 2 E 

VxVxfi = V(V'£) - V 2 E = -/i 0 a i j - 


I am seeking a wave solution for the field, so assume a solution E(f,t ) = Eoe ik ' r LOt . Each V brings down a factor of 
ik and each time derivative a factor —iu, so the equation is 


- k(k-E 0 ) + k 2 E 0 = fi 0 u 2 a(E 0 ) + /i 0 eou 2 Eo 

This is a linear equation for the vector Eq. 

A special case first: A vacuum, so there is no medium and a = 0. The equation is 


(k 2 - fi 0 e 0 uj 2 )E 0 ~k(k-E 0 ) =0 


Pick a basis so that z is along k, then 


(, k 2 — /ioeoLU 2 )Eo — z k 2 z ■ Eq = 0 or in matrix notation, 


1 0 0 


0 0 0 


( k 2 — /loeotu 2 ) 010 — k 2 000 


0 0 1 


0 0 1 


Eox 
E 0 y 
Eqz 


(12.26) 


(12.27) 
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This is a set of linear homogeneous equations for the components of the electric field. One solution for E is identically 
zero, and this solution is unique unless the determinant of the coefficients vanishes. That is 

( k 2 - fi 0 e 0 oj 2 ) 2 ( - /i 0 eocu 2 ) = 0 

This is a cubic equation for cu 2 . One root is zero, the other two are u 2 = k 2 / p oeo- The eigenvector corresponding to 
the zero root has components the column matrix (0 0 1 ), or Eq x = 0 and E$ y = 0 with the ^-component arbitrary. 

The field for this solution is 

E = E 0 z e ikz z, then V ■ E = p/e 0 = ikE 0 z e ikz 

This is a static charge density, not a wave, so look to the other solutions. They are 

Eoz = 0; with Eq x , Eoy arbitrary and k ■ Eq = V ■ E = 0 
E = ( E 0x x + E 0 y y)e lkz ~ luJt and u 2 /k 2 = l/// 0 e 0 = c 2 

This is a plane, transversely polarized, electromagnetic wave moving in the ^-direction at velocity uo/k = l/^/juoeo. 

Now return to the case in which a yf 0. If the polarizability is a multiple of the identity, so that the dipole moment 
density is always along the direction of E , all this does is to add a constant to eo in Eq. (12.26). eo — > + ct = e, and 

the speed of the wave changes from c = l/yTTTo to v = 1 / •/JmE 

The more complicated case occurs when a is more than just multiplication by a scalar. If the medium is a crystal 
in which the charges can be polarized more easily in some directions than in others, ct is a tensor. It is a symmetric 
tensor, though I won’t prove that here. The proof involves looking at energy dissipation (rather, lack of it) as the electric 
field is varied. To compute I will pick a basis, and choose it so that the components of a form a diagonal matrix in this 
basis. 

(olh 0 0 \ 

Pick e) = x, e ?2 = y, = z so that ( a ) = 0 ct 22 0 

\ 0 0 «33 J 

The combination that really appears in Eq. (12.26) is eo + a. The first term is a scalar, a multiple of the identity, so the 
matrix for this is 


/I 

0 

°\ 

/ e n 

0 

0 \ 


0 

1 

0 1 + (cr) = 

° 

^22 

0 

(12.28) 


0 

1/ 

V 0 

0 

^33 ) 



I’m setting this up assuming that the crystal has three different directions with three different polarizabilities. When I 
get down to the details of the calculation I will take two of them equal — that's the case for calcite. The direction of 
propagation of the wave is k, and Eq. (12.26) is 
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0 0 1 


en 0 0 

0 €22 0 
0 0 633 
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k\k\ 

hk 2 

kik 3 V 

/ Eq x \ 

k 2 ki 

k 2 k 2 

k 2 k 3 

E 0y = 0 (12.29) 

k 3 ki 

k 3 k 2 

/ . 

\E 0z J 


In order to have a non-zero solution for the electric field, the determinant of this total 3x3 matrix must be zero. 
This is starting to get too messy, so I’ll now make the simplifying assumption that two of the three directions in the 
crystal have identical properties: en = 622- This makes the system cylindrically symmetric about the z-axis and I can 
then take the direction of the vector k to be in the x-z plane for convenience. 

k = k{e 1 sin a + e 3 cos a ) 

The matrix of coefficients in Eq. (12.29) is now 


k‘ 2 


1 0 0 

0 1 0 | — /j 0 u/ 

0 0 1 



sin 2 a 
0 


0 sin a cos a 

0 0 


k 2 cos 2 a — Ho<jj 2 en 


0 

k 2 - /i 0 tu 2 en 
0 


sin a cos a 0 

— k 2 sin ct cos a 
0 

k 2 sin 2 a — you 2 633 


cos 2 a 


—k 2 sin a cos a 

The determinant of this matrix is 

(k 2 — /doU 2 en) ( k 2 cos 2 a — yo tu 2 eii) ( k 2 sin 2 a — ^uj 2 e 33) — ( — k 2 sin a cos a ) 2 
= ( k 2 - /i 0 u; 2 e 11) [ - y 0 uj 2 k 2 (en sin 2 a + e 33 cos 2 a) + /ioW 4 ene3 3 ] = 0 


(12.30) 


(12.31) 


This has a factor /io uj 2 = 0, and the corresponding eigenvector is k itself. In column matrix notation that is 
(sinct 0 cosct). It is another of those static solutions that aren’t very interesting. For the rest, there are factors 


k 2 — HqUJ 2 an = 0 


and 


k 2 (en sin 2 a + e 33 cos 2 a) - /t 0 w 2 eiie 3 3 = 0 


The first of these roots, uj 2 /k 2 = l/fiodn, has an eigenvector e*2 = y , or (0 1 0). Then E = Eoye ikz and this 

is a normal sort of transverse wave that you commonly find in ordinary non-crystalline materials. The electric vector 
is perpendicular to the direction of propagation. The energy flow (the Poynting vector) is along the direction of k, 
perpendicular to the wavefronts. It’s called the “ordinary ray," and at a surface it obeys Snell's law. The reason it is 
ordinary is that the electric vector in this case is along one of the principal axes of the crystal. The polarization is along 
E just as it is in air or glass, so it behaves the same way as in those cases. 
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The second root has Eq v = 0, and an eigenvector computed as 

( k 2 cos 2 a — /ioCU 2 en ) Eq x — ( k 2 sin a cos a) Eq z = 0 
( k 2 cos 2 a — k 2 (en sin 2 ct + 633 cos 2 a) / 633) Eq x — (A; 2 sin a cos ct) Eq z = 0 

en sin + €33 cos aE 0z = 0 


If the two e’s are equal, this says that E is perpendicular to k. If they aren't equal then k - E ^ 0 and you can write 
this equation for the direction of E as 


Eq x = Eqcos/3, Eq z = — sin/3, then 
en sin a cos / 3 — 633 cos a sin f 3 = 0 

and so tan (3 = —tancr 
€33 



In calcite, the ratio 611/633 = 1.056, making (3 a little bigger than a. 


The magnetic field is in the k x E direction, and the energy flow of the light is along the Poynting vector, in the 
direction E x B. In this picture, that puts B along the //-direction (into the page), and then the energy of the light 
moves in the direction perpendicular to E and B. That is not along k. This means that the propagation of the wave is 
not along the perpendicular to the wave fronts. Instead the wave skitters off at an angle to the front. The “extraordinary 
ray.” Snell’s law says that the wavefronts bend at a surface according to a simple trigonometric relation, and they still 
do, but the energy flow of the light does not follow the direction normal to the wavefronts. The light ray does not obey 
Snell's law 



ordinary ray extraordinary ray 

In calcite as it naturally grows, the face of the crystal is not parallel to any of the x-y or x-z or y-z planes. When 
a light ray enters the crystal perpendicular to the surface, the wave fronts are in the plane of the surface and what 
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happens then depends on polarization. For the light polarized along one of the principle axes of the crystal (the y-axis 
in this sketch) the light behaves normally and the energy goes in an unbroken straight line. For the other polarization, 
the electric field has components along two different axes of the crystal and the energy flows in an unexpected direction 


— disobeying Snell’s law. The normal to the wave front is in the direction of the vector k, and the direction of energy 
flow (the Poynting vector) is indicated by the vector S. 




Simulation of a pattern of x's seen through Calcite. See also Wikipedia: birefringence. 


12.5 Non-Orthogonal Bases 

The next topic is the introduction of more general computational techniques. These will lift the restriction on the type 
of basis that can be used for computing components of various tensors. Until now, the basis vectors have formed an 
orthonormal set 

N = l> e* ■ e-j = 0 if i + 3 

Consider instead a more general set of vectors e). These must be independent. That is, in three dimensions they are not 
coplanar. Other than this there is no restriction. Since by assumption the vectors e) span the space you can write 


v = v l e*. 
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with the numbers v l being as before the components of the vector v. 


NOTE: Here is a change in notation. Before, every index 
was a subscript. (It could as easily have been a super- 
script.) Now, be sure to make a careful distinction between 
sub- and superscripts. They will have different meanings. 


Reciprocal Basis 

Immediately, when you do the basic scalar product you find complications. If u = u^ej, then 

u ■ v = ( u J ej ) ■( v l ei ) = u J v l ej ■ e). 

But since the e) aren’t orthonormal, this is a much more complicated result than the usual scalar product such as 


VjX V y T UyVy T U gV z ■ 

You can’t assume that e\-e 2 = 0 any more. In order to obtain a result that looks as simple as this familiar form, 
introduce an auxiliary basis: the reciprocal basis. (This trick will not really simplify the answer; it will be the same as 
ever. It will however be in a neater form and hence easier to manipulate.) The reciprocal basis is defined by the equation 



1 if i=j 

0 if i + j 


(12.32) 


The e J ’s are vectors. The index is written as a superscript to distinguish it from the original basis, ej. 



To elaborate on the meaning of this equation, e 1 is perpendicular to the plane defined by e *2 and ey and is therefore 
more or less in the direction of e\. Its magnitude is adjusted so that the scalar product 


e 1 ■ ei = 1. 




12 — Tensors 


410 


The “direct basis” and “reciprocal basis” are used in solid state physics and especially in describing X-ray diffraction 
in crystallography. In that instance, the direct basis is the fundamental lattice of the crystal and the reciprocal basis 
would be defined from it. The reciprocal basis is used to describe the wave number vectors of scattered X-rays. 

The basis reciprocal to the reciprocal basis is the direct basis. 

Now to use these things: Expand the vector u in terms of the direct basis and v in terms of the reciprocal basis. 

u = u l ei and v=Vje J . Then u-v = (Ve*j) -(vje J ) 

= u \ S i 

= U l Vi = U l v 1 + U 2 v 2 + U 3 v 3 . 

Notation: The superscript on the components ( u l ) will refer to the components in the direct basis (e)); the subscripts 
( Vj ) will come from the reciprocal basis (e J ). You could also have expanded u in terms of the reciprocal basis and v in 
the direct basis, then 

u-v = UiV 1 = u l Vi (12.33) 

Summation Convention 

At this point, modify the previously established summation convention: Like indices in a given term are to be summed 
when one is a subscript and one is a superscript. Furthermore the notation is designed so that this is the only kind of 
sum that should occur. If you find a term such as UiV j then this means that you made a mistake. 

The scalar product now has a simple form in terms of components (at the cost of introducing an auxiliary basis 
set). Now for further applications to vectors and tensors. 

Terminology: The components of a vector in the direct basis are called the contravariant components of the vector: 
v l . The components in the reciprocal basis are called* the covariant components: Uj. 

Examine the component form of the basic representation theorem for linear functionals, as in Eqs. (12.18) and 
(12.19). 

f(v) = A ■ v for all v. 

Claim: A = e 7(e*) = e)/(e l ) (12.34) 

The proof of this is as before: write v in terms of components and compute the scalar product A ■ v. 

v = v l 6 i. Then A-v = (e J '/(ej)) ■ (Ve)) 

= v'nm 

= »‘/K) = f( v ’e i) = /(’’)■ 


* These terms are of more historical than mathematical interest. 
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Analogous results hold for the expression of A in terms of the direct basis. 

You can see how the notation forced you into considering this expression for A. The summation convention 
requires one upper index and one lower index, so there is practically no other form that you could even consider in order 
to represent A. 

The same sort of computations will hold for tensors. Start off with one of second rank. Just as there were covariant 
and contravariant components of a vector, there will be covariant and contravariant components of a tensor. T(u, v) 
is a scalar. Express u and v in contravariant component form: 

u = u l e{ and v = v J ej. Then T(u, v ) = T(w*ej, v J ej) 

= u x v j T(e h ej) 

= u i v j T ij (12.35) 

The numbers Ty are called the covariant components of the tensor T. 

Similarly, write u and v in terms of covariant components: 

u = Uie 1 and v = vje J . Then T(u, v) = T{uie l , vje J ) 

= UiVjT(e\ e j ) 

= u i v j T ij (12.36) 

And T V 1 are the contravariant components of T . It is also possible to have mixed components: 

T(u, v ) = T(uie l , v^e-j) 

= UiV j T(e l , ej) 

"i< J T'i 

As before, from the bilinear functional, a linear vector valued function can be formed such that 

T(u,v) = u -T(v) and T(v) = e l T(e^ v) 

= e i T(e i ,v ) 

For the proof of the last two lines, simply write u in terms of its contravariant or covariant components respectively. 
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All previous statements concerning the symmetry properties of tensors are unchanged because they were made 
in a way independent of basis, though it's easy to see that the symmetry properties of the tensor are reflected in the 
symmetry of the covariant or the contravariant components (but not usually in the mixed components). 

Metric Tensor 

Take as an example the metric tensor: 

g(u,v)=u-v. (12.37) 

The linear function found by pulling off the u from this is the identity operator. 


g(v) = v 


This tensor is symmetric, so this must be reflected in its covariant and contravariant components. Take as a basis the 
vectors 



Let |e* 2 1 = 1 and |e\| = 2; the angle between them being 45°. A little geometry shows that 


e 1 = — and 

V2 


= v / 2 


Assume this problem is two dimensional in order to simplify things. 
Compute the covariant components: 


9n = g{e i,ei) = 4 
9i2 = g(e i,e 2 ) = V2 

921 = g(e 2 ,ei) = \/2 

922 = g(e 2 ,e 2 ) = 1 


M -(^2 ' f ) 
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Similarly 


,11 


,12 


,21 


,22 


The mixed components are 


g(e \e 1 ) = 1/2 
g(e\e 2 ) = -1/V2 
g(e 2 ,e l ) = -l/y/2 
g(e 2 ,e 2 ) = 2 


g\ =g(e 1 ,e i) = 1 
g\ = g{e 1 ie 2 ) = 0 
g\ =g(e 2 ,e i) = 0 

g 2 2 = g(e 2 ,e 2 ) = 1 


on-dfe -T) 

(<4) = K) = (J f) 


(12.38) 


I used r and c for the indices to remind you that these are the row and column variables. Multiply the first two matrices 
together and you obtain the third one — the unit matrix. The matrix (gij) is therefore the inverse of the matrix (g lJ ). 
This last result is not general, but is due to the special nature of the tensor g. 

12.6 Manifolds and Fields 

Until now, all definitions and computations were done in one vector space. This is the same state of affairs as when you 
once learned vector algebra; the only things to do then were addition, scalar products, and cross products. Eventually 
however vector calculus came up and you learned about vector fields and gradients and the like. You have now set up 
enough apparatus to make the corresponding step here. First I would like to clarify just what is meant by a vector field, 
because there is sure to be confusion on this point no matter how clearly you think you understand the concept. Take a 
typical vector field such as the electrostatic field E. E will be some function of position (presumably satisfying Maxwell's 
equations) as indicated at the six different points. 



Does it make any sense to take the vector E 3 and add it to the vector E 5 ? These are after all, vectors; can't you 
always add one vector to another vector? Suppose there is also a magnetic field present, say with vectors B\, B 2 etc. , 
at the same points. Take the magnetic vector at the point ^3 and add it to the electric vector there. The reasoning 
would be exactly the same as the previous case; these are vectors, therefore they can be added. The second case is 
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clearly nonsense, as should be the first. The electric vector is defined as the force per charge at a point. If you take two 
vectors at two different points, then the forces are on two different objects, so the sum of the forces is not a force on 
anything — it isn’t even defined. 

You can't add an electric vector at one point to an electric vector at another point. These two vectors occupy 
different vector spaces. At a single point in space there are many possible vectors; at this one point, the set of all possible 
electric vectors forms a vector space because they can be added to each other and multiplied by scalars while remaining 
at the same point. By the same reasoning the magnetic vectors at a point form a vector space. Also the velocity vectors. 
You could not add a velocity vector to an electric field vector even at the same point however. These too are in different 
vector spaces. You can picture all these vector spaces as attached to the points in the manifold and somehow sitting 
over them. 

From the above discussion you can see that even to discuss one type of vector field, 
a vector space must be attached to each point of space. If you wish to make a drawing 
of such a system, It is at best difficult. In three dimensional space you could have a three 
dimensional vector space at each point. A crude way of picturing this is to restrict to two 
dimensions and draw a line attached to each point, representing the vector space attached 
to that point. This pictorial representation won’t be used in anything to follow however, so 
you needn’t worry about it. 

The term “vector field” that I’ve been throwing around is just a prescription for selecting one vector out of each 
of the vector spaces. Or, in other words, it is a function that assigns to each point a vector in the vector space at that 
same point. 

There is a minor confusion of terminology here in the use of the word “space.” This could be space in the sense 
of the three dimensional Euclidean space in which we are sitting and doing computations. Each point of the latter will 
have a vector space associated with it. To reduce confusion (I hope) I shall use the word “manifold" for the space over 
which all the vector spaces are built. Thus: To each point of the manifold there is associated a vector space. A vector 
field is a choice of one vector from each of the vector spaces over the manifold. This is a vector field on the manifold. 
In short: The word "manifold" is substituted here for the phrase “three dimensional Euclidean space.” 

(A comment on generalizations. While using the word manifold as above, everything said about it will in fact be 
more general. For example it will still be acceptable in special relativity with four dimensions of space-time. It will also 
be correct in other contexts where the structure of the manifold is non-Euclidean.) 

The point to emphasize here is that most of the work on tensors is already done and that the application to fields 
of vectors and fields of tensors is in a sense a special case. At each point of the manifold there is a vector space to which 
all previous results apply. 

In the examples of vector fields mentioned above (electric field, magnetic field, velocity field) keep your eye on the 
velocity. It will play a key role in the considerations to come, even in considerations of other fields. 
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A word of warning about the distinction between a manifold and the vector spaces at each point of the manifold. 
You are accustomed to thinking of three dimensional Euclidean space (the manifold) as a vector space itself. That is, 
the displacement vector between two points is defined, and you can treat these as vectors just like the electric vectors 
at a point. Don't! Treating the manifold as a vector space will cause great confusion. Granted, it happens to be correct 
in this instance, but in attempting to understand these new concepts about vector fields (and tensor fields later), this 
additional knowledge will be a hindrance. For our purposes therefore the manifold will not be a vector space. The 
concept of a displacement vector is therefore not defined. 

Just as vector fields were defined by picking a single vector from each vector space at various points of the manifold, 
a scalar field is similarly an assignment of a number (scalar) to each point. In short then, a scalar field is a function that 
gives a scalar (the dependent variable) for each point of the manifold (the independent variable). 

For each vector space, you can discuss the tensors that act on that space and so, by picking one such tensor for 
each point of the manifold a tensor field is defined. 

A physical example of a tensor field (of second rank) is stress in a solid. This will typically vary from point to 
point. But at each point a second rank tensor is given by the relation between infinitesimal area vectors and internal 
force vectors at that point. Similarly, the dielectric tensor in an inhomogeneous medium will vary with position and will 
therefore be expressed as a tensor field. Of course even in a homogeneous medium the dielectric tensor would be a tensor 
field relating D and E at the same point. It would however be a constant tensor field. Like a uniform electric field, the 
tensors at different points could be thought of as “parallel" to each other (whatever that means). 


12.7 Coordinate Bases 

In order to obtain a handle on this subject and in order to be able to do computations, it is necessary to put a coordinate 
system on the manifold. From this coordinate system there will come a natural way to define the basis vectors at each 
point (and so reciprocal basis vectors too). The orthonormal basis vectors that you are accustomed to in cylindrical and 
spherical coordinates are not “coordinate bases." 

There is no need to restrict the discussion to rectangular or even to orthogonal coordinate systems. A coordinate 
system is a means of identifying different points of the manifold by different sets of numbers. This is done by specifying 
a set of functions: x 1 , x 2 , x 3 , which are the coordinates. (There will be more in more dimensions of course.) These 
functions are real valued functions of points in the manifold. The coordinate axes are defined as in the drawing by 
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Specify the equations x 2 = 0 and x 3 = 0 for the x 1 coordinate axis. For example in rectangular coordinates 
x 1 = x, x 2 = y, x 3 = z, and the x-axis is the line y = 0, and z = 0. In plan polar coordinates x 1 = r = a constant is 
a circle and x 2 = (j) = a constant is a straight line starting from the origin. 

Start with two dimensions and polar coordinates r and </>. As a basis in this system we routinely use the unit 
vectors f and f>, but is this the only choice? Is it the best choice? Not necessarily. Look at two vectors, the differential 
dr and the gradient V/. 


dr = f dr + (f>rd(j> 


, nf -df 2 1 df 

and v/ = r aF + *f dt 


(12.39) 


And remember the chain rule too. 

df _ df dr df d(j> 
dt dr dt d(j) dt 


(12.40) 


In this basis the scalar product is simple, but you pay for it in that the components of the vectors have extra factors 
in them — the r and the l/r. An alternate approach is to place the complexity in the basis vectors and not in the 
components. 


ei = f, 



and 



(12.41) 


These are reciprocal bases, as indicated by the notation. Of course the original f-(p is self-reciprocal. The preceding 
equations (12.39) are now 


_ . df _»2 df 

and V/ = e ^+e ^ 


dr = ei dr + e* 2 df 


(12.42) 
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Make another change in notation to make this appear more uniform. Instead of r-(j> for the coordinates, 
and x 2 respectively, then 


dr = ei dx 1 + e *2 dx 2 


and 


V/ = e 


if 
dx 1 


+ e' 


dl 

dx 2 


call them x l 

(12.43) 





The velocity vector is 


dr 

dt 


_ dr _ dd> 

= ei dt +e2 H 


and Eq. (12.40) is 


df _ df dx 1 
dt dx i dt 


( Vf)-v 


This sort of basis has many technical advantages that aren't at all apparent here. At this point this “coordinate 
basis" is simply a way to sweep some of the complexity out of sight, but with further developments of the subject the 
sweeping becomes shoveling. When you go beyond the introduction found in this chapter, you find that using any basis 
other than a coordinate basis leads to equations that have complicated extra terms, which you want nothing to do with. 

In spherical coordinates x 1 = r, x 2 = 6, x : * = </> 


e\ = f , e? 2 = r6, e^ = rsmd(j) and 

The velocity components are now 

dx % /dt = {dr/dt, d6/dt, d<f>/dt}, 


e 1 = f , e 2 = 9/r, e 3 = 0/rsin0 
and v = eidx l /dt 


(12.44) 


This last equation is central to figuring out the basis vectors in an unfamiliar coordinate system. 

The use of x 1 , x 2 , and x 3 (x l ) for the coordinate system makes the notation uniform. Despite the use of 
superscripts, these coordinates are not the components of any vectors, though their time derivatives are. 

In ordinary rectangular coordinates, if a particle is moving along the a^-axis (the x-axis) then dx 2 /dt = 0 = 
dx 3 /dt. Also, the velocity will be in the x direction and of size dx 1 /dt. 


e\ = x 
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as you would expect. Similarly 

e 2 = y, e 3 = z. 

In a variation of rectangular coordinates in the plane the axes are not orthogonal to each other, but are rectilinear 
anyway. 



Still keep the requirement of Eq. (12.44) 


v = 


„ dx 1 

6i lit 


ei 


dx 1 

dt 


+ e 2 - 


dx 2 

dt 


(12.45) 


If the particle moves along the x 1 -axis (or parallel to it) then by the definition of the axes, x 2 is a constant and 
dx 2 / dt = 0. Suppose that the coordinates measure centimeters, so that the perpendicular distance between the lines 
is one centimeter. The distance between the points (0,0) and (1,0) is then 1cm/ sinct = cscacm. If in At = one 
second, particle moves from the first to the second of these points, Ax 1 = one cm, so dx 1 / dt = 1 cm/sec. The speed 
however, is cscctcm/sec because the distance moved is greater by that factor. This means that 


rT] | = esc a 


and this is greater than one; it is not a unit vector. The magnitudes of e 2 is the same. The dot product of these two 
vectors is f?i ■ e 2 = cos a/ sin 2 a. 

Reciprocal Coordinate Basis 

Construct the reciprocal basis vectors from the direct basis by the equation 


e l -ej = S] 

In rectangular coordinates the direct and reciprocal bases coincide because the basis is orthonormal. For the tilted basis 
of Eq. (12.45), 

e 2 ■ e 2 = 1 = |e 2 | |e 2 | cos (90° — a) = (cscct)|e 2 | sinct = |e 2 | 
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The reciprocal basis vectors in this case are unit vectors. 

The direct basis is defined so that the components of velocity are as simple as possible. In contrast, the components 
of the gradient of a scalar field are equally simple provided that they are expressed in the reciprocal basis. If you try to 
use the same basis for both you can, but the resulting equations are a mess. 

In order to compute the components of grad / (where / is a scalar field) start with its definition, and an appropriate 
definition should not depend on the coordinate system. It ought to be some sort of geometric statement that you can 
translate into any particular coordinate system that you want. One way to define grad / is that it is that vector pointing 
in the direction of maximum increase of / and equal in magnitude to df / dt where t is the distance measured in 
that direction. This is the first statement in section 8.5. While correct, this definition does not easily lend itself to 
computations. 

Instead, think of the particle moving through the manifold. As a function of time it sees changing values of /. 
The time rate of change of / as felt by this particle is given by a scalar product of the particle's velocity and the gradient 
of /. This is essentially the same as Eq. (8.16), though phrased in a rather different way. Write this statement in terms 
of coordinates 

{x 1 (t) , x 2 (t) , x 3 {t)) = v ■ grad / 

The left hand side is (by the chain rule) 


df_ 

dx 1 


dx 1 df 

x i x 3 + ® x2 


dx 2 df 
.i ~3 dt + dx 3 


v is expressed in terms of the direct basis by 


dx 3 _ df dx 1 df dx 2 df dx 3 
x i x 2 dt dx 1 dt + dx 2 dt + dx 3 dt 


dx 1 
l ~dt : ’ 


now express grad / in the reciprocal basis 

grad / = e*(grad./j . 

The way that the scalar product looks in terms of these bases, Eq. (12.33) is 

d r i 

v ■ grad / = -e- 7 (grad/) j = v l (grad /) . 

Compare the two equations (12.46) and (12.48) and you see 

grad/ = e?i S 


(12.46) 


(12.47) 


(12.48) 


(12.49) 
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For a formal proof of this statement consider three cases. When the particle is moving along the x 1 direction (x 2 & 
x 3 constant) only one term appears on each side of (12.46) and (12.48) and you can divide by v 1 = dx 1 /dtt. Similarly 
for x 2 and x 3 . As usual with partial derivatives, the symbol df /dx i assumes that the other coordinates x 2 and x 3 are 
constant. 

For polar coordinates this equation for the gradient reads, using Eq. (12.41), 


, , ->idf -, 2 df t~\df ( 1 z\ df 

g rad/ = e + e — = ( r )^- + 


dx 1 


dx 2 


dr 


r / d(f> 


which is the standard result, Eq. (8.27). Notice again that the basis vectors are not dimensionless. They can't be, 
because df /dr doesn’t have the same dimensions as df /df. 


Example 

I want an example to show that all this formalism actually gives the correct answer in a 
special case for which you can also compute all the results in the traditional way. Draw 
parallel lines a distance 1cm apart and another set of parallel lines also a distance 1cm 
apart intersecting at an angle a between them. These will be the constant values of the 
functions defining the coordinates, and will form a coordinate system labeled x 1 and x 2 . 

The horizontal lines are the equations x 2 = 0, x 2 = 1cm, etc. 

Take the case of the non-orthogonal rectilinear coordinates again. The components of grad / in the e 1 direction 
is df /dx 1 , which is the derivative of / with respect to x 1 holding X 2 constant, and this derivative is not in the direction 
along e 1 , but in the direction where x 2 = a constant and that is along the x 1 -axis, along e±. As a specific example to 
show that this makes sense, take a particular / defined by 


x 2 



fix 1 , x 2 ) = x 1 

_ i ■ r , , df _>2 df 

For this function grad / = e p + e ^ = e 

e 1 is perpendicular to the x 2 -axis, the line Xi =constant, (as it should be). Its magnitude is the magnitude of e, which 
is one. 

To verify that this magnitude is correct, calculate it directly from the definition. The magnitude of the gradient 
is the magnitude of df /d£ where £ is measured in the direction of the gradient, that is, in the direction e 1 . 


df df dx 1 
dl dx 1 d£ 


= 1-1 = 1 
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Why 1 for dx 1 /dH The coordinate lines in the picture are x 1 = 0, 1, 2,.... When you move on the straight line 
perpendicular to x 2 = constant (e 1 ), and go from x 1 = 1 to x 2 = 2, then both Ax 1 and As are one. 

Metric Tensor 

The simplest tensor field beyond the gradient vector above would be the metric tensor, which I’ve been implicitly using 
all along whenever a scalar product arose. It is defined at each point by 


g{a,b) = a-b (12.50) 

Compute the components of g in plane polar coordinates. The contravariant components of g are from Eq. (12.41) 

1 0 


g lJ = P.£J = 


Covariant: 


Mixed: 


9ij ~ e i ' e j ~ 1 o r 


0 l/r 2 


1 0 


9 l j = e * ■ ej 


1 0 
0 1 


12.8 Basis Change 

If you have two different sets of basis vectors you can compute the transformation on the components in going from 
one basis to the other, and in dealing with fields, a different set of basis vectors necessarily arises from a different set of 
coordinates on the manifold. It is convenient to compute the transformation matrices directly in terms of the different 
coordinate functions. Call the two sets of coordinates x l and y l . Each of them defines a set of basis vectors such that 
a given velocity is expressed as 


v 


_ dx l _ dyi 
dt 6 3 dt 


(12.51) 


What you need is an expression for e'- in terms of e) at each point. To do this, take a particular path for the particle — 
along the t/ 1 -direction ( y 2 & y 3 constant). The right hand side is then 


?< d jL 

1 dt 
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Divide by dy 1 / dt to obtain 


But this quotient is just 


And in general 


d/x' / dy 1 
dt dt 


e t e i Qyl 


e - = e 'W 


(12.52) 


Do a similar calculation for the reciprocal vectors 


, , ^ df df 

grad f = e — = e J 

dx l dip 


Take the case for which / = y k , then 


which gives 


Of _ dy k _ k 
dyi dyi J 


e' k = e - 


(12.53) 


The transformation matrices for the direct and the reciprocal basis are inverses of each other, In the present 
context, this becomes 

e /fc.g* -zp-^ d Jt.p d A 

i i dx‘ 'dyj 
_ .£ dy k dx l 
1 dx £ dyi 

dyk QrjX Qyk 

dx { dyi dyi 
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The matrices dx^ / dyi and its inverse matrix, dy k /dx i are called Jacobian matrices. When you do multiple 
integrals and have to change coordinates, the determinant of one or the other of these matrices will appear as a factor 
in the integral. 

As an example, compute the change from rectangular to polar coordinates 


x 1 = x y 1 =r 
x = r cos 0 r = \J x 2 + y 2 


x 2 = y y 2 = 4> 
y = r sin 0 0 = tan ~ l y/x 


_ dx l 

= £i d, ¥ 



dx 1 

+ 


dx 2 

„ dx 

, ~dy 

ei 

dy 1 

e 2 

dy 1 

= x 

or 

+ V Wr 

Tcos 0 

+ 

y sin 0 : 

= f 



dx 1 

+ 


dx 2 

„ dx 

, -dy 

ei 

dy 2 

e 2 

dy 2 

= x d4> 

+ y M 


= x — r sm< 


+ y(r cos 0) = r0 


Knowing the change in the basis vectors, the change of the components of any tensor is found by computing it in 
the new basis and using the linearity of the tensor itself. The new components will be linear combinations of those from 
the old basis. 

A realistic example using non-orthonormal bases appears in special relativity. Here the manifold is four dimensional 
instead of three and the coordinate changes of interest represent Lorentz transformations. Points in space-time (” events” ) 
can be described by rectangular coordinates ( ct , x , y, z), which are concisely denoted by x l 


i = (0, 1, 2, 3) where x° = ct, x 1 = x, x 2 = y, x 3 = z 


The introduction of the factor c into a ; 0 is merely a question of scaling. It also makes the units the same on all axes. 
The basis vectors associated with this coordinate system point along the directions of the axes. 

This manifold is not Euclidean however so that these vectors are not unit vectors in the usual sense. We have 


eio ■ eo = ~1 ei ■ ei = 1 62-62 = 1 63-63 — 1 

and they are orthogonal pairwise. The reciprocal basis vectors are defined in the usual way, 




12 — Tensors 


424 


so that 


— *n — * — >1 — * — *o — » 

e = — eo e = ei e = e2 


The contravariant (also covariant) components of the metric tensor are 


= e 3 


/- 1 

0 

0 

\ o 


0 0 0\ 
1 0 0 
0 1 0 
0 0 1 / 


— 9ij 


(12.54) 


An observer moving in the +x direction with speed v will have his own coordinate system with which to describe 
the events of space-time. The coordinate transformation is 


x'° = ct! = 


x° — —x^ 


ct - | X 


, r l _ » T 0 

■ f .n _ x i _ 


\J\ — V 2 /c 2 y/l — t; 2 /c 2 
x — vt 

y/l — X 2 /c 2 Y 7 ! “ X 2 /c 2 


(12.55) 


12 I 2 

x = y = x = y 


n / a 
x = z = x = z 


You can check that these equations represent the transformation to an observer moving in the +x direction by asking 
where the moving observer’s origin is as a function of time: It is at x' 1 = 0 or x — vt = 0, giving x = vt as the locus of 
the moving observer's origin. 

The graph of the coordinates is as usual defined by the equations (say for the x ,0 -axis) that x' 1 , x' 2 , x /3 are 
constants such as zero. Similarly for the other axes. 




Find the basis vectors in the transformed system by using equation (12.52). 
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In the present case the ip are x and we need the inverse of the equations (12.55). They are found by changing v to 
—v and interchanging primed and unprimed variables. 


™/0 I Vrr'l 

X Q _ •' i - c x 


pi — v 2 /c 2 


x 1 = c 


e o = 6 


dx l 
1 dx'° 

dx l 
' dx' 1 


= e 0 


pi — V 2 /c 2 
v/c 


pi — t> 2 /c 2 1 \J\~V 2 jc 2 

vie 1 

’ +ei 


(12.56) 


Y^l — t; 2 /c 2 ^/l ~v 2 /c 2 

It is easy to verify that these new vectors point along the primed axes as they should. They also have the property 
that they are normalized to plus or minus one respectively as are the original untransformed vectors. (How about the 
reciprocal vectors?) 

As an example applying all this apparatus, do the transformation of the components of a second rank tensor, the 
electromagnetic field tensor. This tensor is the function that acts on the current density (four dimensional vector) and 
gives the force density (also a four-vector). Its covariant components are 


(^ij) — F{zi, Cj) — 


( E x 

Ey 

\E Z 


—E x 

Ey 

~E Z 

0 

B z 

-By 

-B z 

0 

B x 

By 

~B X 

0 


(12.57) 


where the E’s and B’s are the conventional electric and magnetic field components. Compute a sample component of 
this tensor in the primed coordinate system. 


E' 2 o = E(e' 2 , e' 0 ) = F ^e 2 ,e 0 

F-^20 + 


1 


+ ei- 


v/c 


\J 1 — V 2 /c 2 -y/l — V 2 /c 2 

v/c 


y/l — V 2 jc 2 \J\ — V 2 / C 2 


21 


F , = L_ 

y pi — v 2 /c 2 - 


E y - -B z 
y c 


or in terms of the E and B notation, 
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Since v is a velocity in the +x direction this in turn is 


F , = L_ 

V i/l — V 2 /c 2 L 


E y+-(VXB) % 


(12.58) 


Except possibly for the factor in front of the brackets, this is a familiar, physically correct equation of elementary 
electromagnetic theory. A charge is always at rest in its own reference system. In its own system, the only force it feels 
is the electric force because its velocity with respect to itself is zero. The electric field that it experiences is E' , not the 
E of the outside world. This calculation tells you that this force qE' is the same thing that I would expect if I knew the 
Lorentz force law, F — q[E + v x B] . The factor of yjl — v 2 / c 2 appears because force itself has some transformation 
laws that are not as simple as you would expect. 


Exercises 

1 On the three dimensional vector space of real quadratic polynomials in x, define the linear functional F(f) = 

f ( l dx f(x). Suppose that 1, x, and x 2 are an orthonormal basis, then what vector A represents this functional F so 
that A ■ v = F(f), where the vector v means a quadratic polynomial, as f(x) = a + bx + cx 2 . Ans: A = 1 + | x 2 

2 In the preceding example, take the scalar product to be (f,g) = f\dx f{x)g{x) and find the vector that represents 
the functional in this case. Ans: A= \ + 

3 In the first exercise above, what if the polynomials can have arbitrary order? Ans: A = — ^ ln(l — x) (Not quite right 
because this answer is not a polynomial, so it is not a vector in the original space. There really is no correct answer to 
this question as stated.) 
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Problems 

12.1 Does the function T defined by T(v ) = v + c with c a constant satisfy the definition of linearity? 

12.2 Let the set X be the positive integers. Let the set Y be all real numbers. Consider the following sets and determine 
if they are relations between X and Y and if they are functions. 

{(0,0), (1,2.0), (3, — 7T), (0,1.0), (— 1, e)} 

{(0,0), (1,2.0), (3, -7T), (0,1.0), (2,e)} 

{(0,0), (1,2.0), (3,-tt), (4,1.0), (2, e)} 

{(0,0), (5,5.5), (5. , 7t) (3, -2.0) (7, 8)} 


12.3 Starting from the definition of the tensor of inertia in Eq. (12.1) and using the defining equation for components 
of a tensor, compute the components of I . 

12.4 Find the components of the tensor relating d and F in the example of Eq. (12.2) 

12.5 The product of tensors is defined to be just the composition of functions for the second rank tensor viewed as a 
vector variable. If S and T are such tensors, then ( ST)(v ) = S(T( v)) (by definition) Compute the components of ST 
in terms of the components of S and of T. Express the result both in terms of index notation and matrices. Ans: This 
is matrix multiplication. 

12.6 (a) The two tensors \T and \T are derived from the same bilinear functional rjT, in the Eqs. (12.20)-(12.22). 
Prove that for arbitrary u and v, 

u-\T{v ) = \T(u)-v 

(If it's less confusing to remove all the sub- and superscripts, do so.) 

(b) If you did this by writing everything in terms of components, do it again without components and just using the 
nature of these as functions. (If you did in without components, do it again using components.) 

12.7 What is the significance of a tensor satisfying the relation T[T(v)] = T[T(v)\ = v for all vl Look at what effect 
it has on the scalar product of two vectors if you let T act on both. That is, what is the scalar product of Tu and Tv7 

12.8 Carry out the construction indicated in section 12.3 to show the dielectric tensor is symmetric. 
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12.9 Fill in the details of the proof that the alternating tensor is unique up to a factor. 

12.10 Compute the components of such an alternating tensor in two dimensions. 

Ans: The matrix is anti-symmetric. 

12.11 Take an alternating tensor in three dimensions, pull off one of the arguments in order to obtain a vector valued 
function of two vector variables. See what the result looks like, and in particular, write it out in component form. 

Ans: You should find a cross product here. 

12.12 Take a basis e\ = 2T, e *2 = T + 2 y. Compute the reciprocal basis. Find the components of the vectors A = x — y 
and B = y in in each basis and compute A- B three different ways: in the ei-e-j basis, in its reciprocal basis, and mixed 
( A in one basis and B in the reciprocal basis). A fourth way uses the x-y basis. 

12.13 Show that if the direct basis vectors have unit length along the directions of 0 A and 
01 3 then the components of v in the direct basis are the lengths OA and OB. What are the 
components in the reciprocal basis? 

12.14 What happens to the components of the alternating tensor when a change of basis is 
made? Show that the only thing that can happen is that all the components are multiplied by the 
same number (which is the determinant of the transformation). Compute this explicitly in two 
dimensions. 



12.15 A given tensor (viewed as a function of two vector variables) has the property that it equals zero whenever the 
two arguments are the same, T(v,v) = 0. Show that it is antisymmetric. This is also true if it is a function of more 
than two variables and the above equation holds on some pair of arguments. Consider v = au + /3w 

12.16 If the components of the alternating tensor are (in three dimensions) e^j k where ei 23 = 1, Compute eB k . Compute 


e ljk eimk , eV k e £jk , e^ k e ijk 


12.17 In three dimensions three non-collinear vectors from a point define a volume, that of the parallelepiped included 
between them. This defines a number as a function of three vectors. Show that if the volume is allowed to be negative 
when one of the vectors reverses that this defines a trilinear functional and that it is completely antisymmetric, an 
alternating tensor. (Note problem 12.15.) If the units of volume are chosen to correspond to the units of length of the 
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basis vectors so that three one inch long perpendicular vectors enclose one cubic inch as opposed to 16.387 cm 5 then the 
functional is called "the” alternating tensor. Find all its components. 

12.18 Find the direct coordinate basis in spherical coordinates, also the reciprocal basis. 

Ans: One set is f, 9/r, (f) /r sin 6 (now which ones are these?) 

12.19 Draw examples of the direct and reciprocal bases (to scale) for the example in Eq. (12.45). Do this for a wide 
range of angles between the axes. 

12.20 Show that the area in two dimensions enclosed by the infinitesimal parallelogram between x 1 and x 1 + dx 1 and 
x 2 and x 2 + dx 2 is sjgdx 1 dx 2 where g is the determinant of ( g^j ). 

12.21 Compute the transformation laws for the other components of the electromagnetic field tensor. 

12.22 The divergence of a tensor field is defined as above for a vector field. T is a tensor viewed as a vector valued 
function of a vector variable 

divT= Um y^T(dA) 

It is a vector. Note: As defined here it requires the addition of vectors at two different points of the manifold, so it must 
be assumed that you can move a vector over parallel to itself and add it to a vector at another point. Had I not made 
the point in the text about not doing this, perhaps you wouldn't have noticed the assumption involved, but now do it 
anyway. Compute divT in rectangular coordinates using the x, y, 2 basis. 

Ans: dT'd / dx) in coordinate basis. 

12.23 Compute divT in cylindrical coordinates using both the coordinate basis and the usual unit vector (f, </>, 2) basis. 
This is where you start to see why the coordinate basis has advantages. 

12.24 Show that g l i gj£ = <5|. 

12.25 If you know what it means for vectors at two different points to be parallel, give a definition for what it means 
for two tensors to be parallel. 

12.26 Fill in the missing steps in the derivation following Eq. (12.24), and show that the alternating tensor is unique up 
to a factor. 

12.27 The divergence of a vector field computed in a coordinate basis is divT 1 = dF l /dx 1 . The divergence computed 
in standard cylindrical coordinates is Eq. (9.15). Show that these results agree. 

12.28 From Eq. (12.31) verify the stated properties of the uj = 0 solution. 


Vector Calculus 2 


There's more to the subject of vector calculus than the material in chapter nine. There are a couple of types of line 
integrals and there are some basic theorems that relate the integrals to the derivatives, sort of like the fundamental 
theorem of calculus that relates the integral to the anti-derivative in one dimension. 

13.1 Integrals 

Recall the definition of the Riemann integral from section 1.6. 

r b N 

/ dxf(x)= lim y2/(&)Aa; fc l 13 - 1 ) 

Ja Ax k ^0 f-' 

k = 1 

This refers to a function of a single variable, integrated along that one dimension. 

The basic idea is that you divide a complicated thing into little pieces to get an approximate answer. Then you 
refine the pieces into still smaller ones to improve the answer and finally take the limit as the approximation becomes 
perfect. 

What is the length of a curve in the plane? Divide the curve into a lot of small pieces, then if the pieces are small 
enough you can use the Pythagorean Theorem to estimate the length of each piece. 


A4 = V(Aa*) 2 + (Ajfcja 



JAift 


Ax k 


The whole curve then has a length that you estimate to be the sum of all these intervals. Finally take the limit to get 
the exact answer. 

A4 = V ( Ax k) 2 + (At/fc) 2 
k 

How do you actually do this? That will depend on the way that you use to describe the curve itself. Start with the 
simplest method and assume that you have a parametric representation of the curve: 

x = f(t) and y = g(t ) 



J d£ = I \J dx 2 + dy 2 


(13.2) 
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Then dx = f(t)dt and dy = g(t)dt, so 

d£ = \j (/ (t)dt) 2 + (g(t)dt) 2 = \J j(t) 2 + g(t) 2 dt (13.3) 

and the integral for the length is 

f d£= f dt \Jf{t) 2 +g(t) 2 

where a and b are the limits on the parameter t. Think of this as f d£ = f vdt, where v is the speed. 

Do the simplest example first. What is the circumference of a circle? Use the parametrization 

x = Rcos(p, y = R sin0 then d£ = \J (-R sin </>) 2 + (R cos (ft) 2 d(j) = Rd(j) (13.4) 

The circumference is then f d£ = | 0 " 7r Rdcfi = 2nR. An ellipse is a bit more of a challenge; see problem 13.3. 

If the curve is expressed in polar coordinates you may find another formulation preferable, though in essence it is 
the same. The Pythagorean Theorem is still applicable, but you have to see what it says in these coordinates. 


= V (Ar fc ) 2 + (r fc A0 fc ) 2 



rk^Vk 


If this picture doesn’t seem to show much of a right triangle, remember there's a limit involved, as A and A </>£. 
approach zero this becomes more of a triangle. The integral for the length of a curve is then 




To actually do this integral you will pick a parameter to represent the curve, and that parameter may even be (j) itself. 
For an example, examine one loop of a logarithmic spiral: r = roe fc< ^. 


d£ = v / dr 2 + r 2 dcj) 2 = \J (dr/dcj)) 2 + r 2 d(f> 
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The length of the arc from 0 = 0 to 0 = 2n is 


J \J ( r 0 k e k( £) 2 + (r 0 e k( £) 2 d(f) = J chproe^y/k 2 + 1 = r 0 \A 2 + l-^ [e 2kn - l] 


If k — > 0 you can easily check that this give the correct answer. In the other extreme, for large k, you can also check 
that it is a plausible result, but it's a little harder to see. 


Weighted Integrals 

The time for a particle to travel along a short segment of a path is dt = d£/v where v is the speed, 
along a path is of course the integral of dt. 


T = 



d£ 

v 


The total time 


(13.5) 


How much time does it take a particle to slide down a curve under the influence of gravity? If the speed is determined 
by gravity without friction, you can use conservation of energy to compute the speed. I'll use the coordinate y measured 
downward from the top point of the curve, then 



v = yj (2 E/m) + 2 gy 


(13.6) 


Suppose that this particle starts at rest from y = 0, then E = 0 and v = \J2 gy. Does the total time to reach a 
specific point depend on which path you take to get there? Very much so. This will lead to a classic problem called the 
“brachistochrone.” See section 16.3 for that. 

1 Take the straight-line path from (0, 0) to (xq, 2/o) ■ The path is x = y ■ Xo/yo. 


T 


d£ 

v 


d£ = \/ dx 2 + dy 2 



dy^l + x 2 0 /yl 

VWy 


dyy/l + xl/y$, so 


1 V x o + vl 

2 V 2 gyo 


(13.7) 
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2 There are an infinite number of possible paths, and another choice of path can give a smaller 
or a larger time. Take another path for which it’s easy to compute the total time. Drop straight 
down in order to pick up speed, then turn a sharp corner and coast horizontally. Compute the V 
time along this path and it is the sum of two pieces. 




dy f x ° dx 

VWy Jo VWyo 


1 T 1 

7^9 [2 


ViJo + 


xp 

Vvo. 


1 

V 2 ay 0 


[ x 0 + 1/0/2] 


(13.8) 


Which one takes a shorter time? See problem 13.9. 

3 What if the path is a parabola, x = y 2 ■ Xo/y q? It drops rapidly at first, picking up speed, but then takes a more direct 
route to the end. Use y as the coordinate, then 


dx = 2y-x 0 /yl, 


T = f — = 


and d(. = yj (4 y 2 xl/y§) + 1 dy 
ry 0 yj {Ay 2 xl/y$) + 1 


VTnJ 


-dy 


This is not an integral that you’re likely to have encountered yet. I'll refer you to a large table of integrals, where you 
can perhaps find it under the heading of elliptic integrals. 

In more advanced treatments of optics, the time it takes light to travel along a path is of central importance 
because it is related to the phase of the light wave along that path. In that context however, you usually see it written 
with an extra factor of the speed of light. 

cT = I 'Jjj = I ndl (13.9) 

This last form, written in terms of the index of refraction, is called the optical path. Compare problems 2.35 and 2.39. 


13.2 Line Integrals 

Work done on a point mass in one dimension is an integral. If the mass moves in three dimensions and if the force 
happens to be a constant, then work is simply a dot product: 

rx f 

W = I F x (x)dx respectively W = F ■ Ar 

J X i 

The general case for work on a particle moving along a trajectory in space is a line integral. It combines these two 
equations into a single expression for the work along an arbitrary path for an arbitrary force. There is not then any 
restriction to the elementary case of constant force. 
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The basic idea is a combination of Eqs. (13.1) and (13.2). Divide the specified curve into a number of pieces, at 
the points {r^}. Between points k — 1 and k you had the estimate of the arc length as yj (Ax^) 2 + (A t/^) 2 , but here 
you need the whole vector from r/ c _ 1 to in order to evaluate the work done as the mass moves from one point to the 
next. Let A — r^_ i, then 


N 


lim 

| Ar fc |->-0 


F(r k ) ■ A r k = 

k = 1 


F(r) ■ dr 



(13.10) 


This is the definition of a line integral. 

How do you evaluate these integrals? To repeat what happened with Eq. (13.2), that will depend on the way that 
you use to describe the curve itself. Start with the simplest method and assume that you have a parametric representation 
of the curve: f(t), then dr = fdt and the integral is 

I F(f)-dr= j ' F(r (?)) -fdt 


This is now an ordinary integral with respect to t. In many specific examples, you may find an easier way to represent 
the curve, but this is something that you can always fall back on. 

In order to see exactly where this is used, start with F = ma, Take the dot product with dr and manipulate the 
expression. 


- dv 
F = m dt' 


ft dv dv dr dr _ 

so k - dr = m— ■ dr = m— ■ -j-dt = mdv ■ = mv ■ dv 

dt dt dt dt 

or F ■ dr = ] -^d(y -v) 


(13.11) 


The integral of this from an initial point of the motion to a final point is 





(13.12) 


This is a standard form of the work-energy theorem in mechanics. In most cases you have to specify the whole path, not 
just the endpoints, so this way of writing the theorem is somewhat misleading. Is it legitimate to manipulate v ■ dv as 
in Eq. (13.11)? Yes. Simply write it in rectangular components as v x dv x +v y dv y + v z dv z and you can integrate each 
term with no problem; then assemble the result as v 2 /2. 
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Example 

If F = Axyx + B(x 2 + L 2 )y, what is the work done going from point (0, 0) to (L, L ) along the three different paths 
indicated.? 


F-dr= I [F x x + F y y] -[xdx + ydy] 

r L 


F ■ dr = 


F ■ dr = 


'c 3 


[ [F x dx + Fydy] = f dx0+ f dyB2L 2 = 2BL 3 
J Jo Jo 

[ dx Ax 2 + [ dyB(y 2 + L 2 ) = AL 3 /3 + ABL 3 /3 

I o Jo 

[ L dyB(0 + L 2 )+ f L dxAxL = BL 3 + AL 3 / 2 

I o Jo 



Gradient 

What is the line integral of a gradient? Recall from section 8.5 and Eq. (8.16) that df = grad f-dr. The integral of 
the gradient is then 

J grad/ ■ dr = J df = f 2 - fi (13.13) 

where the indices represent the initial and final points. When you integrate a gradient, you need the function only at its 
endpoints. The path doesn’t matter. Well, almost. See problem 13.19 for a caution. 


13.3 Gauss's Theorem 

The original definition of the divergence of a vector field is Eq. (9.9), 


1 dV 1 , _ , J 

div V = lirn 77 -r- = lim 77 (p v ■ dA 

v^o V dt v^o V 


Fix a closed surface and evaluate the surface integral off? over that surface. 
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Now divide this volume into a lot of little volumes A V k with individual bounding surfaces S k . The picture on the right 
shows just two adjoining pieces of whole volume, but there are many more. If you do the surface integrals of v ■ dA over 
each of these pieces and add all of them, the result is the original surface integral. 


^ ® v-dA= * v-dA (13.14) 

k J si- J s 


The reason for this is that each interior face of volume V k is matched with the face of an adjoining volume V/../ . The 
latter face will have dA pointing in the opposite direction, n k , = — n k , so when you add all the interior surface integrals 
they cancel. All that’s left is the surface on the outside and the sum over all those faces is the original surface integral. 
In the equation (13.14) multiply and divide every term in the sum by the volume AV^. 


E 

k L 


AV k 



v • dA 



v • dA 


Now increase the number of subdivisions, finally taking the limit as all the AVj. approach zero. The quantity inside the 
brackets becomes the definition of the divergence of v and you then get 


Gauss’s Theorem: 


div v dV =<£ v ■ dA 


IV 


s 


(13.15) 


This* is Gauss’s theorem, the divergence theorem. 

Example 

Verify Gauss's Theorem for the solid hemisphere, r < R, 

0<0<it/2, 0<(j)< 27 r. Use the vector field 

F = far 2 sin# + 9 /3rd 2 cos 2 <f> + 07 r sin 9 cos 2 0 



* You will sometimes see the notation dV instead of S for the boundary surface surrounding the volume V. Also 
dA instead of C for the boundary curve surrounding the area A. It is probably a better and more consistent notation, 
but it isn’t yet as common in physics books. 
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Doing the surface integral on the hemisphere n = f, and on the bottom flat disk you have n = 9. The surface integral 
is then assembled from two pieces, 


F-dA = 


far 2 sin9-fdA + / 9 /3r6 2 cos 2 0 ■ 6dA 


' hemisph 

r 7 t /2 


'disk 


r*27T 


rR 


r2ir 


R 2 smdd9 / d(j) aR 2 sin 9 + / rdr d(f) (3r(7v/2) 2 cos 2 </> 
J o Jo Jo Jo 

= cm 2 R A /2 + (3 tt 3 R 3 /12 


(13.17) 


Now do the volume integral of the divergence, using Eq. (9.16). 


1 d 4 • /i i 1 9 a rP n 9 / 1 9 ■ n 2 

0 ‘ prQ sm 6 cos" 0 + — 0 


divF = -r^r-ar' sin 9 + . 

ar r sm 9 69 


r sin 9 6(f) 


7 r sin 9 cos 2 0 


= 4 ar sin 9 + j3 cos 2 0[2 9 + 9 2 cot 9} + 27 sin 0 cos 0 

The 7 term in the volume integral is zero because the 2 sin 0 cos 0 = sin 20 factor averages to zero over al 

rR C 71 "/ 2 /-2 7T 

/ drr 2 sin 9 d9 / d(j) [4crr sin# + 0 cos 2 0[20 + 6* 2 cot 6 1 ]] 


( i? 4 7T _ i? 3 

= 4a- — -27T--+0- — ■ 7T ■ 


r/2 


t/ 6 1 sin 9 [29 + 9 2 cot 9] 


= aTT 2 R 2 /2 + Pn 3 R 3 /l2 

The last integration used parametric differentiation starting from J(J T/2 dB cos k9, with differentiation with respect to k. 


13.4 Stokes’ Theorem 

The expression for the curl in terms of integrals is Eq. (9.17), 

1 


curl v = lim T , „ 

v^o V J 


i 


dA x v 


(13.18) 


Use the same reasoning that leads from the definition of the divergence to Eqs. (13.14) and (13.15) (see problem 13.6), 
and this leads to the analog of Gauss’s theorem, but with cross products. 


dA x v = curl v dV 
JV 


(13.19) 
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This isn't yet in a form that is all that convenient, and a special case is both easier to interpret and more useful in 


applications. First apply it to a particular volume, one that is very thin and small. Take a tiny disk of height Ah, with 
top and bottom area AA\. Let h\ be the unit normal vector out of the top area. For small enough values of these 
dimensions, the right side of Eq. (13.18) is simply the value of the vector curl v inside the volume times the volume 
AAiAh itself. 




Take the dot product of both sides with hi, and the parts of the surface integral from the top and the bottom faces 
disappear. That’s just the statement that on the top and the bottom, dA is in the direction of ±ni, so the cross product 
makes dA x v perpendicular to h\. 

I'm using the subscript i for the top surface and I'll use 2 for the surface around the edge. Otherwise it's too easy 
to get the notation mixed up. 

Now look at dA x v around the thin edge. The element of area has height Ah and length At along the arc. Call 
?i 2 the unit normal out of the edge. 




The product h\ ■ AA 2 x v = h\ ■ 77,2 x v AhAt = h\ XU 2 -V AhAt, using the property of the triple scalar product. The 
product h\ x fi 2 is in the direction along the arc of the edge, so 


h\ x fi 2 At = At 


(13.20) 


Put all these pieces together and you have 



Divide by AA\Ah and take the limit as AA\ — > 0. Recall that all the manipulations above work only under the 
assumption that you take this limit. 



(13.21) 
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You will sometimes see this equation (13.21) taken as the definition of the curl, and it does have an intuitive appeal. 
The one drawback to doing this is that it isn't at all obvious that the thing on the right-hand side is the dot product of 
ni with anything. It is, because I deduced that fact from the vectors in Eq. (13.19), but if you use Eq. (13.21) as your 
starting point you have some proving to do. 

This form is easier to interpret than was the starting point with a volume integral. The line integral of v ■ d£ is 
called the circulation of v around the loop. Divide this by the area of the loop and take the limit as the area goes to 
zero and you then have the “circulation density" of the vector field. The component of the curl along some direction is 
then the circulation density around that direction. Notice that the equation (13.20) dictates the right-hand rule that the 
direction of integration around the loop is related to the direction of the normal n\. 

Stokes' theorem follows in a few lines from Eq. (13.21). Pick a surface A with a boundary C (or dA in the other 
notation). The surface doesn’t have to be flat, but you have to be able to tell one side from the other.* From here on 
I’ll imitate the procedure of Eq. (13.14). Divide the surface into a lot of little pieces AA k , and do the line integral of 
v- d£ around each piece. Add all these pieces and the result is the whole line integral around the outside curve. 





(13.22) 


As before, on each interior boundary between area A A k and the adjoining AA k i, the parts of the line integrals on the 
common boundary cancel because the directions of integration are opposite to each other. All that's left is the curve on 
the outside of the whole loop, and the sum over those intervals is the original line integral. 

Multiply and divide each term in the sum (13.22) by A A k and you have 


E 


1 

AA k 






(13.23) 


Now increase the number of subdivisions of the surface, finally taking the limit as all the A A k — > 0, and the quantity 
inside the brackets becomes the normal component of the curl of v by Eq. (13.21). The limit of the sum is the definition 
of an integral, so 

Stokes' Theorem: / cur \v ■ dA = ® v-d£ (13.24) 

Ja Jc 


* That means no Klein bottles or Mobius strips. 
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What happens if the vector field v is the gradient of a function, v = V/? By Eq. (13.13) the line integral in 
(13.24) depends on just the endpoints of the path, but in this integral the initial and final points are the same. That 
makes the integral zero: f\ — f\. That implies that the surface integral on the left is zero no matter what the surface 
spanning the contour is, and that can happen only if the thing being integrated is itself zero, curl grad / = 0. That’s 
one of the common vector identities in problem 9.36. Of course this statement requires the usual assumption that there 
are no singularities of v within the area. 

Example 

Verify Stokes' theorem for that part of a spherical surface r = R, 0 < 9 < 9q, 0 < 0 < 2n. 

Use for this example the vector field 


F = f Ar 2 sin 9 + 9 Br9 2 cos 0 + 0CV sin 9 cos 2 0 

To compute the curl of F, use Eq. (9.33), getting 

Vxl = f 



(13.25) 


1 


r sin 9 V 86 
1 

r sin 6 


( sin 9 Cr sin 9 cos 2 0 ) — ( Br9 2 cos 0) ] + 


= r — 7 — ^ (Cr cos 2 0 2 sin 9 cos 9 + Br9 2 sin 0) + 


0o**. 


(13.26) 


I need only the f component of the curl because the surface integral uses only the normal (r) component. The 
surface integral of this has the area element dA = r 2 sin 9 d9 d(j). 

r r9 o /*27r 


curl F ■ dA = 


R 2 sin 9 d9 


d(f) 


1 


Rsin9 


(Ci? cos 2 (j) 2 sin 6 cos 9 + BR9 2 sin 0) 


f»27T 


= R 2 


d9 


dcj) 2 C cos 2 0 sin 9 cos 9 


= R 2 2Ctt sin 2 9 0 /2 = CR 2 n sin 2 9 0 

The other side of Stokes' theorem is the line integral around the circle at angle 9o- 


J>F-d£ = 


r 27 t 


r sin 9q dcj) Cr sin 9 cos 2 0 


r2n 


= I d<j)CR 2 sin 2 9 0 cos 2 0 

Jo 

= CR 2 sin 2 9 0 n 


(13.27) 
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and the two sides of the theorem agree. Check! Did I get the overall signs right? The direction of integration around 
the loop matters. A further check: If 0 o = ^ r, the length of the loop is zero and both integrals give zero as they should. 

Conservative Fields 

An immediate corollary of Stokes' theorem is that if the curl of a vector field is zero throughout a region then line 
integrals are independent of path in that region. To state it a bit more precisely, in a volume for which any closed path 
can be shrunk to a point without leaving the region, if the curl of v equals zero, then fl‘ F ■ dr depends on the endpoints 
of the path, and not on how you get there. 

To see why this follows, take two integrals from point a to point b. 



This equations happens because the minus sign is the same thing that you get by integrating in the reverse direction. 
For a field with V x v = 0, Stokes’ theorem says that this closed path integral is zero, and the statement is proved. 

What was that fussy-sounding statement “for which any closed path can be shrunk to a point without leaving the 
region?” Consider the vector field in three dimensions, written in rectangular and cylindrical coordinates, 


v = A(xy - yx)/(x 2 + y 2 ) = Acj)/r 


(13.28) 


You can verify (in either coordinate system) that its curl is zero — except for the 2 -axis, where it is singular. A closed 
loop line integral that doesn’t encircle the 2 -axis will be zero, but if it does go around the axis then it is not. The vector's 
direction 9 always points counterclockwise around the axis. See problem 13.17. If you have a loop that encloses the 
singular line, then you can't shrink the loop without its getting hung up on the axis. 

The converse of this theorem is also true. If every closed-path line integral of v is zero, and if the derivatives of 
v are continuous, then its curl is zero. Stokes’ theorem tells you that every surface integral of V x v is zero, so you 
can pick a point and a small A A at this point. For small enough area whatever the curl is, it won't change much. The 
integral over this small area is then V x v ■ A A, and by assumption this is zero. It’s zero for all values of the area vector. 
The only vector whose dot product with all vectors is zero is itself the zero vector. 
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Potentials 

The relation between the vanishing curl and the fact that the line integral is independent of path leads to the existence 
of potential functions. 

If curl F = 0 in a simply-connected domain (that's one for which any closed loop can be shrunk to a point), then 
I can write F as a gradient, — grad/. The minus sign is conventional. I've already constructed the answer (almost), 
and to complete the calculation note that line integrals are independent of path in such a domain, and that means that 
the integral 

[ F-dr (13.29) 

Jf 0 

is a function of the two endpoints alone. Fix fo and treat this as a function of the upper limit r. Call it — f(f ). The 
defining equation for the gradient is Eq. (8.16), 


df = grad / ■ dr 


How does the integral (13.29) change when you change f a bit? 


p f+df 


F-dr— / F ■ dr = 


t-r+dr 


F -dr = F-dr 


'To 


I T 0 


This is —df because I called this integral — f(r ). Compare the last two equations and because df is arbitrary you 
immediately get 

F = — grad/ (13.30) 

I used this equation in section 9.9, stating that the existence of the gravitational potential energy followed from the fact 
that V x g = 0. 

Vector Potentials 

This is not strictly under the subject of conservative fields, but it’s a convenient place to discuss it anyway. When a 
vector field has zero curl then it's a gradient. When a vector field has zero divergence then it's a curl. In both cases the 
converse is simple, and it's what you see first: V x V/ = 0 and V ■ V x A = 0 (problem 9.36). In Eqs. (13.29) and 
(13.30) I was able to construct the function / because V x F = 0. It is also possible, if V ■ F = 0, to construct the 
function A such that F = V x A. 

In both cases, there are extra conditions needed for the statements to be completely true. To conclude that a 
conservative field (V x F = 0) is a gradient requires that the domain be simply-connected, allowing the line integral to 
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be completely independent of path. To conclude that a field satisfying V ■ F = 0 can be written as F = V x A requires 
something similar: that all closed surfaces can be shrunk to a point. This statement is not so easy to prove, and the 
explicit construction of A from F is not very enlightening. 

You can easily verify that A = B x rj 2 is a vector potential for the uniform field B. Neither the scalar potential 
nor the vector potential are unique. You can always add a constant to a scalar potential because the gradient of a 
scalar is zero and it doesn’t change the result. For the vector potential you can add the gradient of an arbitrary function 
because that doesn't change the curl. 

F = — V(/ + C) = -V/, and B = Vx(l+ V/) = VxI (13.31) 


13.5 Reynolds Transport Theorem 

When an integral has limits that are functions of time, how do you differentiate it? That's pretty easy for one-dimensional 
integrals, as in Eqs. (1.19) and (1.21). 


d fh® 

— / dx g(x, t) 
dt d hit) 



dg{x,t) 

dt 


+ 9(f2(t),t) 


df2(t) 

dt 




dfijt) 

dt 


One of Maxwell’s equations for electromagnetism is 


(13.32) 


V x E 


dB 

~dt 


(13.33) 


Integrate this equation over the surface S. 


f V x E ■ dA = I E-d£= f -^r-dA (13.34) 

J S J C J s 

This used Stokes' theorem, and I would like to be able to pull the time derivative out of the integral, but can I? If the 
surface is itself time independent then the answer is yes, but what if it isn’t? What if the surface integral has a surface 
that is moving? Can this happen? That’s how generators works, and you wouldn't be reading this now without the 
power they provide. The copper wire loops are rotating at high speed, and it is this motion that provides the EMF. 

I'll work backwards and compute the time derivative of a surface integral, allowing the surface itself to move. To 
do this, I'll return to the definition of a derivative. The time variable appears in two places, so use the standard trick of 
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adding and subtracting a term, just as in section 1.5. Call <f> the flux integral, f B ■ dA. 


A<f> = / B(t + At)-dA — / B{t)-dA 
JS(t+At) J S(t) 

= I B(t + At)-dA- [ B(t) ■ dA 

JS(t+At) JS(t+At) 

+ [ B(t) ■ dA - [ B(t) ■ dA 

JS(t+At) JS(t) 

B is a function of r too, but I won't write it. The first two terms have the same surface, so they combine to give 

[ A B-dA 

JS(t+At) 

and when you divide by At and let it approach zero, you get 


(13.35) 


IS(t) 


dB 

dt ' dA 


Now for the next two terms, which require some manipulation. Add and subtract the surface that forms the edge between 
the boundaries C(t ) and C(t + At). 








B(t) ■ dA - f B(t) ■ dA = (f B(t) ■ dA - [ B-dA 

JS(t) j J edge 


(13.36) 

I S(t+At) JS(t ) J Jedge 

The strip around the edge between the two surfaces make the surface integral closed, but I then have to subtract it as 
a separate term. 

You can convert the surface integral to a volume integral with Gauss's theorem, but it's still necessary to figure 
out how to write the volume element. [Yes, V ■ B = 0, but this result can be applied in other cases too, so don't use that 
fact here.] The surface is moving at velocity v, so an area element A A will in time At sweep out a volume AA-vAt. 
Note: v isn’t necessarily a constant in space and these surfaces aren't necessarily flat. 


AK = A A ■ vAt 


B(t) ■ dA = / d 3 r V ■ B = / V-BdA-vAt 
J JS(t) 


(13.37) 
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To do the surface integral around the edge, use the same method as in deriving Stokes' theorem, Eq. (13.20). 



B ■ dA = / B -di x vAt = / v x B ■ dt At 


' edge 


1C 


L 


Put Eqs. (13.37) and (13.38) into Eq. (13.36) and then into Eq. (13.35), divide by At and let At — > 0. 


d 


dB 


-A / B-dA= ■ dA + 


dt 


lS(t) 


lS(t) 


dt 


V-Bv-dA- 


v x B -di 


>S(t) 


IC(t) 


(13.38) 


(13.39) 


This transport theorem is the analog of Eq. (13.32) for a surface integral. 

In order to check this equation, and to see what the terms do, try some example vector functions that isolate the 
terms, so that only one of the terms on the right side of (13.39) is non-zero at a time. 


1: B = Bozt, with a surface z = 0, x 2 + y 2 < R 2 

For a constant Bo, and v = 0, only the first term is present. The equation is BoTtR 2 = BoTtR 2 . 
Now take a static field 


2 : B = Czz, with a moving surface z = vt, x 2 + y 2 < R 2 

The first and third terms on the right vanish, and V ■ B = C . The other terms are 


—j~Cz z " ttR 2 z 
dt 


z=vt 


= CvtcR 2 = J {C)v z ■ dA = CvttR 2 


Now take a uniform static field 

3 : B = Bq z with a radially expanding surface z = 0, x 2 + y 2 < R 2 , R = vt 
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The first and second terms on the right are now zero, and 


d 


^ B 0 n(vt ) 2 = 2 B 0 nv 2 t = — y>{vr x B 0 z)-6d£ 

= — <f(—vB 0 6) ■ Odl = +vB 0 2tcR 


R=vt 


= 2Bo'Kv‘ 2 t 


Draw some pictures of these three cases to see if the pictures agree with the algebra. 

Faraday’s Law 

If you now apply the transport theorem (13.39) to Maxwell’s equation (13.34), and use the fact that V ■ B = 0 you get 


d 


>C{t ) 


(E + vxB)-dd = — — 


B-dA 


r 5(t) 


(13.40) 


This is Faraday’s law, saying that the force per charge integrated around a closed loop (called the EMF) is the negative 
time derivative of the magnetic flux through the loop. 

Occasionally you will find an introductory physics text that writes Faraday's law without the v x B term. That’s 
o.k. as long as the integrals involve only stationary curves and surfaces, but some will try to apply it to generators, with 
moving conductors. This results in amazing contortions to try to explain the results. For another of Maxwell's equations, 
see problem 13.30. 

The electromagnetic force on a charge is F = q(E + v x B) . This means that if a charge inside a conductor is free 
to move, the force on it comes from both the electric and the magnetic fields in this equation. (The Lorentz force law.) 
The integral of this force ■ d£ is the work done on a charge along some specified path. If this integral is independent of 
path: V x E = 0 and v = 0, then this work divided by the charge is the potential difference, the voltage, between the 
initial and final points. In the more general case, where one or the other of these requirements is false, then it's given 
the somewhat antiquated name EMF, for “electromotive force.” (It is often called “voltage” anyway, though if you're 
being fussy that's not really correct.) 

13.6 Fields as Vector Spaces 

It's sometimes useful to look back at the general idea of a vector space and to rephrase some common ideas in that 
language. Vector fields, such as E(x,y,z) can be added and multiplied by scalars. They form vector spaces, infinite 
dimensional of course. They even have a natural scalar product 


{E u E 2 )= / d^rE^-E^r) 


(13.41) 
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Here I'm assuming that the scalars are real numbers, though you can change that if you like. For this to make sense, 
you have to assume that the fields are square integrable, but for the case of electric or magnetic fields that just means 
that the total energy in the field is finite. Because these are supposed to satisfy some differential equations (Maxwell’s), 
the derivative must also be square integrable, and I’ll require that they go to zero at infinity faster than l/r 3 or so. 

The curl is an operator on this space, taking a vector field into another vector field. Recall the definitions of 
symmetric and hermitian operators from section 7.14. The curl satisfies the identity 


(-^a, V x E 2 ) — (V x Ei, E 2 ) 

For a proof, just write it out and then find the vector identity that will allow you to integrate by parts. 

V ■ (Ax B) = B -V x A - A-V x B 


Equation (13.42) is 


(13.42) 

(13.43) 


J d 3 r E\ (r) ■ V x E 2 {r) 


d 3 r (V x Ei(f)) ■ E 2 {f) 


d 3 r V ■ (Ei x E 2 ) 


The last integral becomes a surface integral by Gauss's theorem, <fdA- (E\ x E 2 ), and you can now let the volume 
(and so the surface) go to infinity. The fields go to zero sufficiently fast, so this is zero and the result is proved: Curl is 
a symmetric operator. Its eigenvalues are real and its eigenvectors are orthogonal. This is not a result you will use often, 
but the next one is important. 

Helmholtz Decomposition 

There are subspaces in this vector space of fields: (1) The set of all fields that are gradients. (2) The set of all fields 
that are curls. These subspaces are orthogonal to each other; every vector in the first is orthogonal to every vector in the 
second. To prove this, just use the same vector identity (13.43) and let A = V/. I will first present a restricted version 
of this theorem because it's simpler. Assume that the domain is all space and that the fields and their derivatives all go 
to zero infinitely far away. A generalization to finite boundaries will be mentioned at the end. 

V/'Vx5 = BTxV/-V'(V/x5) 

Calculate the scalar product of one vector field with the other. 

(V/, VxB) = I d 3 rVf ■ V x B = J d 3 r [B ■ V x V/ - V ■ (V/ x B)] 

= 0 - (V/ x B) ■ dA = 0 


(13.44) 
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As usual, the boundary condition that the fields and their derivatives go to zero rapidly at infinity kills the surface term. 
This proves the result, that the two subspaces are mutually orthogonal. 

Do these two cases exhaust all possible vector fields? In this restricted case with no boundaries short of infinity, 
the answer is yes. The general case later will add other possibilities. Here you have two orthogonal subspaces, and to 
show that these two fill out the whole vector space, I will ask the question: what are all the vector fields orthogonal to 
both of them? I will show first that whatever they are will satisfy Laplace’s equation, and then the fact that the fields 
go to zero at infinity will be enough to show that this third case is identically zero. This statement is the Helmholtz 
theorem: Such vector fields can be written as the sum of two orthogonal fields: a gradient, and a curl. 

To prove it, my plan of attack is to show that if a field F is orthogonal to all gradients and to all curls, then V 2 F 
is orthogonal to all square-integrable vector fields. The only vector that is orthogonal to everything is the zero vector, 
so F satisfies Laplace’s equation. The assumption now is that for general / and v, 

J d 3 rF-Vf = 0 and 

I want to show that for a general vector field u, 

J d 3 r u ■ V 2 F = 0 


J d 3 r F ■ V x v = 0 



The method is essentially two partial integrals, moving two derivatives from F over to u. Start with the d 2 /dz 2 term 
in the Laplacian and hold off on the dx and dy integrals. Remember that all these functions go to zero at infinity. Pick 
the /-component of u and the y-component of F . 



d 2 

dz Uj-pr-^Fi 
dz 1 J 




- / dz (d z Uj) (d z Fj) 


0 - ( d z Ui)Fj 


+ J dz ( d 2 z Ui)Fj 



dz (d 2 Ui)Fj 


Now reinsert the dx and dy integrals. Repeat this for the other two terms in the Laplacian, d 2 Fj and d 2 Fj. The result 
is 


d 3 r u ■ V 2 F 


d 3 r (y 2 u ) ■ F 


(13.45) 
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If this looks familiar it is just the three dimensional version of the manipulations that led to Eq. (5.15). 
Now use the identity 

V x (V x u) = V(V -u) - V 2 u 

in the right side of (13.45) to get 


J d 3 ru-V 2 F 


d 3 r 


(V(V -u)) ■ F - (V x (V x it)) -F 


(13.46) 


The first term on the right is the scalar product of the vector field F with a gradient. The second term is the scalar 
product with a curl. Both are zero by the hypotheses of the theorem, thereby demonstrating that the Laplacian of F is 
orthogonal to everything, and so V 2 F = 0. 

When you do this in all of space, with the boundary conditions that the fields all go to zero at infinity, the only 
solutions to Laplace's equation are identically zero. In other words, the two vector spaces (the gradients and the curls) 
exhaust all the possibilities. How to prove this? Just pick a component, say F x , treat it as simply a scalar function — 
call it / — and apply a vector identity, problem 9.36. 

v-(0A) = (v<j>)-A+<t>(v-A) 

Let 0 = / and A = Vf, then V -(/V/) = V/ ■ V/ + /V 2 / 

Integrate this over all space and apply Gauss’s theorem. ( l.e . integrate by parts.) 

J d 3 r V -(/V/) = ji /V/ ■ dA = J d 3 r [V/ ■ V/ + /V 2 /] (13.47) 

If / and its derivative go to zero fast enough at infinity (a modest requirement), the surface term, f dA, goes to zero. 
The Laplacian term, V 2 / = 0, and all that's left is 


J d 3 r V/ ■ V/ = 0 


This is the integral of a quantity that can never be negative. The only way that the integral can be zero is that the 
integrand is zero. If V/ = 0, then / is a constant, and if it must also go to zero far away then that constant is zero. 
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This combination of results, the Helmholtz theorem, describes a field as the sum of a gradient and a curl, but is 
there a way to find these two components explicitly? Yes. 

F = V/ + VxB, so V ■ F = V 2 /, and V x F = V x V x B = V(V ■ B ) - V 2 F 

Solutions of these equations are 


f(f) 


-1 

47T 


d 3 r' 


V ■ F(f ' ') 

I ry* ry* / I 


and 


B < ? > - SF 




I ry* ry* / 


(13.48) 


Generalization 

In all this derivation, I assumed that the domain is all of three-dimensional space, and this made the calculations easier. 
A more general result lets you specify boundary conditions on some finite boundary and then a general vector field is 
the sum of as many as five classes of vector functions. This is the Helmholtz-Hodge decomposition theorem, and it has 
applications in the more complicated aspects of fluid flow (as if there are any simple ones), even in setting up techniques 
of numerical analysis for such problems. The details are involved, and I will simply refer you to a good review article* 
on the subject. 


Exercises 

1 For a circle, from the definition of the integral, what is <f d£? What is <f dll What is <f di x C where C is a constant 
vector? 

2 What is the work you must do in lifting a mass m in the Earth's gravitational field from a radius R\ to a radius R- 2 - 
These are measured from the center of the Earth and the motion is purely radial. 

3 Same as the preceding exercise but the motion is 1. due north a distance i?i(9o then 2. radially out to R -2 then 3. due 
south a distance R 2 O 0 . 

* Cantarella, DeTurck, and Gluck: The American Mathematical Monthly, May 2002. The paper is an unusual mix 
of abstract topological methods and very concrete examples. It thereby gives you a fighting chance at the subject. 
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4 Verify Stokes' Theorem by separately calculating the left and the right sides of the theorem for the case of the vector 
field 


F(x,y) 


xAy + y Bx 


around the rectangle (a < x < b), (c < y < d). 


5 Verify Stokes' Theorem by separately calculating the left and the right sides of the theorem for the case of the vector 
field 


F(x, y) = xAy — y Bx 


around the rectangle (a < x < b), (c < y < d). 


6 Verify Stokes' Theorem for the semi-cylinder 0 < z < h, 0 < 0 < it, r = R. The vector field is F(r,(j),z) = 
fAr 2 sin 0 + (f)Br(j) 2 z + zCrz 2 


7 Verify Gauss’s Theorem using the whole cylinder 0 < z < h, r = R and the vector field F(r,(j),z ) = fAr 2 sin (f) + 
( pBrz sin 2 0 + zCrz 2 . 


8 What would happen if you used the volume of the preceding exercise and the field of the exercise before that one to 
check Gauss's law? 
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Problems 

13.1 In the equation (13.4) what happens if you start with a different parametrization for x and y, perhaps x = 
_Rcos(0'/2) and y = i?sm(0'/2) for 0 < 0' < 47 r. Do you get the same answer? 

13.2 What is the length of the arc of the parabola y = (a 2 — x 2 )/b, (—a < x < a)? 

But First draw a sketch and make a rough estimate of what the result ought to be. Then do the calculation and compare 

the answers. What limiting cases allow you to check your result? 

Ans: ( b / 2) [ sinh -1 c + cV 1 + c 2 J where c = 2a/b 

13.3 (a) You can describe an ellipse as x = acos0, y = 6sin0. (Prove this.) 

(b) Warm up by computing the area of the ellipse. 

(c) What is the circumference of this ellipse? You will find a (probably) unfamiliar integral here, so to put this integral 

into a standard form, note that it is 4 . Then use cos 2 0 = 1 — sin 2 0. Finally, look up chapter 17, Elliptic Integrals, 

of Abramowitz and Stegun. You will find the reference to this at the end of section 1.4. Notice in this integral that 

when you integrate, it will not matter whether you have a sin 2 or a cos 2 . Ans: 4 aE(m) 

13.4 For another derivation of the work-energy theorem, one that doesn't use the manipulations of calculus as in 
Eq. (13.11), go back to basics. 

(a) For a constant force, start from F = ma and derive by elementary manipulations that 

F ■ A r = y [v 2 - vf] 

All that you need to do is to note that the acceleration is a constant so you can get v and r as functions of time. Then 
eliminate t 

(b) Along a specified curve Divide the curve at points 

r\ = r ), fi, f 2 , ... r N = r f 

In each of these intervals apply the preceding equation. This makes sense in that if the interval is small the force won’t 
change much in the interval. 

(c) Add all these N equations and watch the kinetic energy terms telescope and (mostly) cancel. This limit as all the 
Ar *. — ^ 0 is Eq. (13.12). 
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13.5 The manipulation in the final step of Eq. (13.12) seems almost too obvious. Is it? Well yes, but write out the 
definition of this integral as the limit of a sum to verify that it really is easy. 


13.6 Mimic the derivation of Gauss's theorem, Eq. (13.15), and derive the identities 


dA x v = / curl vdV, and ® f dA = / gr&dfdV 

s Jv Js Jv 


13.7 The force by a magnetic field on a small piece of wire, length d£, and carrying a current / is dF = I d£ x B. The 
total force on a wire carrying this current in a complete circuit is the integral of this. Let B = xAy — y Ax. The wire 
consists of the line segments around the rectangle 0 < x < a, 0 < y < b. The direction of the current is in the +y 
direction on the x = 0 line. What is the total force on the loop? Ans: 0 

13.8 Verify Stokes' theorem for the field F = Axyx + B( 1 + x 2 y 2 )y and for the rectangular loop a < x < b, 

c < y < d. 

13.9 Which of the two times in Eqs. (13.7) and (13.8) is shorter. (Compare their squares; it's easier.) 

13.10 Write the equations (9.36) in an integral form. 

13.11 Start with Stokes’ theorem and shrink the boundary curve to a point. That doesn’t mean there’s 
no surface left; it’s not flat, remember. The surface is pinched off like a balloon. It is now a closed 
surface, and what is the value of this integral? Now apply Gauss's theorem to it and what do you get? 

Ans: See Eq. (9.34) 

13.12 Use the same surface as in the example, Eq. (13.25), and verify Stokes’ theorem for the vector field 

F = f Ar~ l cos 2 9 sin 0 + 9 Br 2 sin 9 cos 2 0 + 0CV -2 cos 2 9 sin 2 0 



13.13 Use the same surface as in the example, Eq. (13.25), and examine Stokes' theorem for the vector field 


F = f f (r , 9 , 0) + §g(r, 9, 0) + <j>h(r, 9 , 0) 
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(a) Show from the line integral part that the answer can depend only on the function h, not / or g. (b) Now examine 
the surface integral over this cap and show the same thing. 

13.14 For the vector field in the x-y plane: F = ( xy — yx)/ 2, use Stokes’ theorem to compute the 
line integral of F ■ df around an arbitrary closed curve. What is the significance of the sign of the 
result? When you considered an “arbitrary” loop, did you consider the possibilities presented by these 
curves? 

13.15 What is the (closed) surface integral of F = r/3 over an arbitrary closed surface? Ans: V. 

13.16 What is the (closed) surface integral of F = r/3 over an arbitrary closed surface? This time however, the surface 
integral uses the cross product: <f dA x F . If in doubt, try drawing the picture for a special case first. 

13.17 For the vector field Eq. (13.28) explicitly show that § v- df is zero for a curve such as that in the 
picture and that it is not zero for a circle going around the singularity. 

13.18 Refer to Eq. (13.27) and check it for small 9q. Notice the combination t:{R0q) 2 . 

13.19 For the vector field, Eq. (13.28), use Eq. (13.29) to try to construct a potential function. Because within a certain 
domain the integral is independent of path, you can pick the most convenient possible path, the one that makes the 
integration easiest. What goes wrong? 

13.20 Refer to problem 9.33 and construct the solutions by integration, using the methods of this chapter. 

13.21 (a ) Evaluate <fi F ■ dr for F = xAxy + y Bx around the circle of radius R centered at the origin. 

(b) Now do it again, using Stokes' theorem this time. 

13.22 Same as the preceding problem, but <f dr x F instead. 

13.23 Use the same field as the preceding two problems and evaluate the surface integral of F ■ dA over the hemispherical 
surface x 2 + y 2 + z 2 = R 2 , z > 0. 

13.24 The same field and surface as the preceding problem, but now the surface integral dA x F. Ans: z2TtBr 3 /3 



O G> 
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13.25 (a) Prove the identity V ■ (A x B) = B ■ V x A — A-V x B . (index mechanics?) 

(b) Next apply Gauss's theorem to V- [A x B) and take the special case that B is an arbitrary constant to derive 
Eq. (13.19). 

13.26 (a) Prove the identity V m (fF) = /V ■ F + F ■ V/. 

(b) Apply Gauss’s theorem to V m (fF) for an arbitrary constant F to derive a result found in another problem. 

(c) Explain why the word “arbitrary” is necessary here. 

13.27 The vector potential is not unique, as you can add an arbitrary gradient to it without affecting its curl. Suppose 
that B = V x A with 

A = xaxyz + y /3 x 2 z + z'yxyz 2 

Find a function f(x,y,z) such that N = A + Vf has the ^-component identically zero. Do you get the same B by 
taking the curl of A and of A’l 


13.28 Take the vector field 


B 


axyx + /3xyy + 7 (xz + yz)z 


Write out the equation B = V x A in rectangular components and figure out what functions A x (x,y, z), Ay(x,y, z), 
and A z (x,y,z ) will work. Note: From the preceding problem you see that you may if you wish pick any one of the 
components of A to be zero — that will cut down on the labor. Also, you should expect that this problem is impossible 
unless B has zero divergence. That fact should come out of your calculations, so don’t put it in yet. Determine the 
conditions on a, (3, and 7 that make this problem solvable, and show that this is equivalent to V -B = 0. 


13.29 A magnetic monopole, if it exists, will have a magnetic field /j,oq m r / Anr 2 . The divergence of this magnetic 
field is zero except at the origin, but that means that not every closed surface can be shrunk to a point without 
running into the singularity. The necessary condition for having a vector potential is not satisfied. Try to construct 
such a potential anyway. Assume a solution in spherical coordinates of the form A = (j>f{r)g{6) and figure out what 
/ and g will have this B for a curl. Sketch the resulting A. You will run into a singularity (or two, depending). 
Ans: A = 4>fioq m (l — cos#)/ (47rr 2 sin#) (not unique) 


13.30 Apply the Reynolds transport theorem to the other of Maxwell's equations. 


V x B 


f dE 
AoJ + Aoeo-^r 
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Don’t simply leave the result in the first form that you find. Manipulate it into what seems to be the best form. Use 

/i 0 e 0 = 1/c 2 . 

Ans: § [B — v x E/c 2 ) ■ dl = /io J (j — pv) ■ dA + p Q eo (ci/dt) f E ■ dA 

13.31 Derive the analog of the Reynolds transport theorem, Eq. (13.39), for a line integral around a closed loop. 


(a) 


*L Pm - d= 


C(t) 


d F -> 

w di ~ 


C(t) 


v x (V x F ) ■ dl 


and for the surface integral of a scalar. You will need problem 13.6. 


(b) 


4 [ 00 r, t)dA = 

JS(t) 


'S(t ) 




/ (V 0) dA ■ v - ® (pdixv 

lS(t) JC(t) 


Make up examples that test the validity of individual terms in the equations. I recommend cylindrical coordinates for 
your examples. 

13.32 Another transport theorem is more difficult to derive. 


d_ 

dt 



d£ x F(f, t ) 


J> d£x d J^+ i (V-F)dlxv 

Jett) dt J C{ t) 



(VF )-d£xv 


I had to look up some vector identities, including one for V x (A x B ). A trick that I found helpful: At a certain point 
take the dot product of the whole equation with a fixed vector B and manipulate the resulting product, finally factoring 
out the arbitrary vector B ■ at the end. Make up examples that test the validity of individual terms in the equations. 
Again, I recommend cylindrical coordinates for your examples. 

13.33 Apply Eq. (13.39) to the velocity field itself. That is, let B = v . Suppose further that the fluid is incompressible 
with V -v = 0 and that the flow is stationary (no time dependence). Explain the results. 

13.34 Assume that the Earth’s atmosphere obeys the density equation p = poe~ z / h for a height z above the surface, 
(a) Through what amount of air does sunlight have to travel when coming from straight overhead? Take the measure 
of this to be f pd£ (called the "air mass”), (b) Through what amount of air does sunlight have to travel when coming 
from just on the horizon at sunset? Neglect the fact that light will refract in the atmosphere and that the path in the 
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second case won’t really be a straight line. Take h = 10 km and the radius of the Earth to be 6400 km. The integral 
you get for the second case is probably not familiar. You may evaluate it numerically for the numbers that I stated, or 
you may look it up in a big table of integrals such as Gradshteyn and Ryzhik, or you may use an approximation, h <C R. 
(I recommend the last.) What is the numerical value of the ratio of these two air mass integrals? This goes far in 
explaining why you can look at the setting sun. 

(c) If refraction in the atmosphere is included, the light will bend and pass through a still larger air mass. The overall 
refraction comes to about 0.5°, and calculating the path that light takes is hard, but you can find a bound on the answer 
by assuming a path that follows the surface of the Earth through this angle and then takes off on a straight line. What 
is the air mass ratio in this case? The real answer is somewhere between the two calculations. (The really real answer is 
a little bigger than either because the atmosphere is not isothermal and so the approximation p = poe~ z ^ L is not exact.) 
Ans: « y/Rir/2h = 32, +R9/h 37. 

13.35 Work in a thermodynamic system is calculated from dW = P dV . Assume an ideal gas, so that 
PV = nRT . (a) What is the total work, j> dW, done around this cycle as the pressure increases at constant 
volume, then decreases at constant temperature, finally the volume decreases at constant pressure. 

(b) In the special case for which the changes in volume and pressure are very small, estimate from the 
graph approximately what to expect for the answer. Now do an expansion of the result of part (a) to see 
if it agrees with what you expect. Ans: ~ AP AV/2 

13.36 Verify the divergence theorem for the vector field 

F = axyzx + /3x 2 z( 1 + y)y + 7 xyz 2 z 

and for the volume (0 < x < a), (0 < y < b), (0 < z < c). 

13.37 Evaluate f F ■ dA over the curved surface of the hemisphere x 2 + y 2 + z 2 = R 2 and z > 0. The vector field is 
given by F = V x (oryx + f3xy + 7 xyz) . Ans: ({3 — a)7rR 2 

13.38 A vector field is given in cylindrical coordinates to be F = far 2 zs\v?(j) + (j)f3rz + z'jzr cos 2 (j). Verify the 
divergence theorem for this field for the region (0 < r < R), (0 < (f> < 2tt), (0 < z < h). 

13.39 For the function F(r,9) = r n (A + B cos 9 + C cos 2 6), compute the gradient and then the divergence of this 
gradient. For what values of the constants A, B, C, and (positive, negative, or zero) integer n is this result, V -VF, 
zero? These coordinates are spherical, and this combination divgrad is the Laplacian. 

Ans: In part, n = 2, C = — 2>A, B = 0. 
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13.40 Repeat the preceding problem, but now interpret the coordinates as cylindrical (change 6 to 0). And don't 
necessarily leave your answers in the first form that you find them. 

13.41 Evaluate the integral f F ■ dA over the surface of the hemisphere x 2 + y 2 + z 2 = 1 with z > 0. The vector field 
is F = A{ 1 + x + y)x + B(l+y + z)y + C(1 + z + x)z. You may choose to do this problem the hard way or the easy 
way. Or both. 

Ans: ir(2A + 2B + 5C)/3 

13.42 An electric field is known in cylindrical coordinates to be E = f(r)f, and the electric charge density is a function 
of r alone, p(r). They satisfy the Maxwell equation V -E = p/e o. If the charge density is given as p(r ) = poe~ rifr °. 
Compute E. Demonstrate the behavior of E is for large r and for small r. 

13.43 Repeat the preceding problem, but now r is a spherical coordinate. 

13.44 Find a vector field F such that V ■ F = ax + j3y + 7 and V x F = z. Next, find an infinite number of such 
fields. 

13.45 Gauss's law says that the total charge contained inside a surface is 6q § E ■ dA. For the electric field of prob- 
lem 10.37, evaluate this integral over a sphere of radius R\ > R and centered at the origin. 

13.46 (a) In cylindrical coordinates, for what n does the vector field v = r n 0 have curl equal to zero? Draw it. 

(b) Also, for the same closed path as in problem 13.17 and for all n, compute jv-dr. 

13.47 Prove the identity Eq. (13.43). Write it out in index notation first. 

13.48 There an analog of Stokes' theorem for j>dd x B. This sort of integral comes up in computing the total force 
on the current in a circuit. Try multiplying (dot) the integral by a constant vector C . Then manipulate the result by 
standard methods and hope that in the end you have the same constant C ■ something. 

Ans: = / [(Vi?) ■ dA — (V ■ B) ■ eL4] and the second term vanishes for magnetic fields. 

13.49 In the example (13.16) using Gauss's theorem, the term in 7 contributed zero to the surface integral (13.17). 
In the subsequent volume integral the same term vanished because of the properties of sin 0 cos 0. But this term will 
vanish in the surface integral no matter what the function of 0 is in the 0 component of the vector F . How then is it 
always guaranteed to vanish in the volume integral? 
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13.50 Interpret the vector field F from problem 13.37 as an electric field E, then use Gauss’s law that ^enclosed = 
eo § E • dA to evaluate the charge enclosed within a sphere or radius R centered at the origin. 

13.51 Derive the identity Eq. (13.32) starting from the definition of a derivative and doing the same sort of manipulation 
that you use in deriving the ordinary product rule for differentiation. 

13.52 A right tetrahedron has three right triangular sides that meet in one vertex. Think of a corner 
chopped off of a cube. The sum of the squares of the areas of the three right triangles equals the square 
of the area of the fourth face. The results of problem 13.6 will be useful. 



Complex Variables 


In the calculus of functions of a complex variable there are three fundamental tools, the same fundamental tools as 
for real variables. Differentiation, Integration, and Power Series. I'll first introduce all three in the context of complex 
variables, then show the relations between them. The applications of the subject will form the major part of the chapter. 

14.1 Differentiation 

When you try to differentiate a continuous function is it always differentiable? If it's differentiable once is it differentiable 
again? The answer to both is no. Take the simple absolute value function of the real variable x. 


fix) = \x 

This has a derivative for all x except zero. The limit 


x (x > 0) 

— x {x < 0) 


fix + Ax) - f(x) 

Ax 


1 (x > 0) 
-1 (x < 0) 
? {x = 0) 


(14.1) 


works for both x > 0 and x < 0. If x = 0 however, you get a different result depending on whether Arc — » 0 through 
positive or through negative values. 

If you integrate this function, 



x 2 /2 (x > 0 ) 
-rc 2 /2 (x < 0) 



the result has a derivative everywhere, including the origin, but you can't differentiate it twice. A few more integrations 
and you can produce a function that you can differentiate 42 times but not 43. 
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There are functions that are continuous but with no derivative anywhere. They’re harder* to construct, but if 
you grant their existence then you can repeat the preceding manipulation and create a function with any number of 
derivatives everywhere, but no more than that number anywhere. 

For a derivative to exist at a point, the limit Eq. (14.1) must have the same value whether you take the limit from 
the right or from the left. 

Extend the idea of differentiation to complex-valued functions of complex variables. Just change the letter x to 
the letter z = x + iy. Examine a function such as f(z) = z 2 = x 2 — y 2 + 2 ixy or cosc = cos x cosh y + i sinxsinht/. 
Can you differentiate these (yes) and what does that mean? 


/'(*) 


lim 

Az->0 


f(z + Az)-f(z) 

Az 


df_ 

dz 


(14.2) 


is the appropriate definition, but for it to exist there are even more restrictions than in the real case. For real functions 
you have to get the same limit as Ax — > 0 whether you take the limit from the right or from the left. In the complex 
case there are an infinite number of directions through which Az can approach zero and you must get the same answer 
from all directions. This is such a strong restriction that it isn’t obvious that any function has a derivative. To reassure 
you that I’m not talking about an empty set, differentiate z 2 . 


(z + Az) 2 - z 2 

Az 


2zAz + (A zf 
-~Az 


= 2z + Az 


2 z 


It doesn't matter whether Ac = Ax or = iAy or = (1 + i)At. As long as it goes to zero you get the same answer. 
For a contrast take the complex conjugation function, f(z) = z* = x — iy. Try to differentiate that. 


(z + Az)* - z* _ (Az)* _ Are l9 _ _ 2ie 
Az Az A r e i9 

The polar form of the complex number is more convenient here, and you see that as the distance A r goes to zero, this 
difference quotient depends on the direction through which you take the limit. From the right and the left you get +1. 
From above and below (9 = ±7t/ 2) you get —1. The limits aren’t the same, so this function has no derivative anywhere. 
Roughly speaking, the functions that you're familiar with or that are important enough to have names (sin, cos, tanh, 
Bessel, elliptic, . . . ) will be differentiable as long as you don't have an explicit complex conjugation in them. Something 
such as \z\ = \J z*z does not have a derivative for any z. 

* Weierstrass surprised the world of mathematics with ]T “ a k cos(b k x). If a < 1 while ab > 1 this is continuous 
but has no derivative anywhere. This statement is much more difficult to prove than it looks. 
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For functions of a real variable, having one or fifty-one derivatives doesn't guarantee you that it has two or fifty-two. 
The amazing property of functions of a complex variable is that if a function has a single derivative everywhere in the 
neighborhood of a point then you are guaranteed that it has an infinite number of derivatives. You will also be assured 
that you can do a power series expansions about that point and that the series will always converge to the function. 
There are important and useful integration methods that will apply to all these functions, and for a relatively small effort 
they will open impressively large vistas of mathematics. 

For an example of the insights that you gain using complex variables, consider the function f(x) = l/(l+x 2 ). 
This is a perfectly smooth function of x, starting at /( 0) = 1 and slowing dropping to zero as x — > Too. Look at the 
power series expansion about x = 0 however. This is just a geometric series in (— x 2 ), so 

(l + x 2 ) = 1 - x 2 + x 4 - x 6 H 

This converges only if — 1 < x < +1. Why such a limitation? The function is infinitely differentiable for all x and is 
completely smooth throughout its domain. This remains mysterious as long as you think of re as a real number. If you 
expand your view and consider the function of the complex variable z = x + iy, then the mystery disappears. 1/(1 + z 2 ) 
blows up when z — » ±i. The reason that the series fails to converge for values of |a;| > 1 lies in the complex plane, in 
the fact that at the distance = 1 in the i-direction there is a singularity, and in the fact that the domain of convergence 
is a disk extending out to the nearest singularity. 


Definition: A function is said to be analytic at the point Zq if 
it is differentiable for every point z in the disk | z — Zo\ < £. 
Here the positive number £ may be small, but it is not zero. 


Necessarily if / is analytic at Co it will also be analytic at every point within the disk | z — Zq\ < £. 
This follows because at any point z\ within the original disk you have a disk centered at Z\ and of radius 
(£ — \zi — Zq\)/2 on which the function is differentiable. 

The common formulas for differentiation are exactly the same for complex variables as they are for real 
variables, and their proofs are exactly the same. For example, the product formula: 


f(z + A z)g{z + A z) - f(z)g(z) 

A z 

_ f(z + A z)g(z + A z) - f(z)g(z + A z) + f(z)g(z + A z) - f{z)g{z) 

Ac 


/(c + Ac)-/(c) 

Az 


g(z + Ac) + f(z) 


g(z + Ac) — g(z) 

Ac 
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As Ao — > 0, this becomes the familiar f'g + fg'. That the numbers are complex made no difference. 
For integer powers you can use induction, just as in the real case: dz/dz = 1 and 


dz^ 

If — — = ri 2 n_1 , then use the product rule 

dz 


dz n+l 

dz 


d{z n ■ z) 
dz 


= nz n 1 ■ z + z n ■ 1 = (n + 1 )z n 


The other differentiation techniques are in the same spirit. They follow very closely from the definition. For 
example, how do you handle negative powers? Simply note that z n z~ n = 1 and use the product formula. The chain 
rule, the derivative of the inverse of a function, all the rest, are close to the surface. 


14.2 Integration 

The standard Riemann integral of section 1.6 is 


N 


f(x)dx= lim 
v ' Ax k ^0 


^fitk)Ax k 


k = l 


The extension of this to complex functions is direct. Instead of partitioning the interval a < x < b into N pieces, 
you have to specify a curve in the complex plane and partition it into N pieces. The interval is the complex number 

Az k = z k -z k _ 1 . 

r N 

/ f(z) dz = lim f(( k )Az k 

Just as £ k is a point in the k th interval, so is Q k a point in the k th interval along the curve C. 

How do you evaluate these integrals? Pretty much the same way that you evaluate line integrals in vector calculus. 
You can write this as 


C2 A* C4 Cs ,Ce 

^0 1 


^6 


J f{z)dz = J (u(x,y) +iv(x,y))(dx + idy) = J \(udx — v dy) + i(udy + v dx)\ 

If you have a parametric representation for the values of x(t) and y(t) along the curve this is 

rt 2 


Iti 


\{ux — v y) + i(uy + v ±)] dt 
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For example take the function f(z) = z and integrate it around a circle centered at the origin, x = RcosO, y = RsmO. 


J zdz = J \(x dx — y dy) + i(x dy + y dx)\ 

= / d6R 2 [(— cos 6 sin 6 — sin 6 cos 9) + i(cos 2 6 — sin 2 #)] = 0 


Wouldn't it be easier to do this in polar coordinates? z = re iS . 

r2ir 


ire 


J zdz= I re ld [e lG dr + i 

Do the same thing for the function \ / z. Use polar coordinates. 


/»27T 

?2 / J2 iO , 


! Re w iRe ie dO = iR 2 / e 2W dd = 0 

o Jo 


(14.3) 


/ 


-dz 

z 



~^iRe ie d9 
Re 10 


r2n 


idO = 2ni 


(14.4) 


This is an important result! Do the same thing for z n where n is any positive or negative integer, problem 14.1. 

Rather than spending time on more examples of integrals, I’ll jump to a different subject. The main results about 
integrals will follow after that (the residue theorem). 

14.3 Power (Laurent) Series 

The series that concern us here are an extension of the common Taylor or power series, and they are of the form 


+00 

a k (z-z 0 ) k 

— OO 


(14.5) 


The powers can extend through all positive and negative integer values. This is sort of like the Frobenius series that 
appear in the solution of differential equations, except that here the powers are all integers and they can either have a 
finite number of negative powers or the powers can go all the way to minus infinity. 

The common examples of Taylor series simply represent the case for which no negative powers appear. 


sm £ = 


00 ~2fc+l 

\k " 




(2fc + l)! 


or Uz) = J^(-l) 


00 „2k 

k Z 


2 2k (kl) 


or 


1 -z 


E 


= > z 
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If a function has a Laurent series expansion that has a finite number of negative powers, it is said to have a pole. 


COSZ 


£(-«' 


.z 


2k— 1 


Sint 


(2 k)\ 


or 


E(-D' 


z 


2k— 2 


( 2/5 + 1 )! 


The order of the pole is the size of the largest negative power. These have respectively first order and second order poles. 

If the function has an infinite number of negative powers, and the series converges all the way down to (but of 
course not at) the singularity, it is said to have an essential singularity. 


gt/z _ 


oo 1 

Em 


k\ z k 


or sm 


t [ z + - 

z 


1 


or 


1-z z 1-i 


1 -1 00 

Vr-E 


,-k 


The first two have essential singularities; the third does not. 

It’s worth examining some examples of these series and especially in seeing what kinds of singularities they have. 
In analyzing these I’ll use the fact that the familiar power series derived for real variables apply here too. The binomial 
series, the trigonometric functions, the exponential, many more. 

1 / z(z — 1) has a zero in the denominator for both z = 0 and z = 1. What is the full behavior near these two 
points? 


!+z3 + -" 1 


1 - 1 1 r 

z(z-l) - (z-l)(l + z-l) - z^l \- 1 + ( z ~ 1 


1-1 


— — — 1 — z — z 2 — • • • 

z 


= [1 + (z ~ 1) + (z - l) 2 + (Z ~ l) 3 + • • • ] = -y-t—j- + 1 + (z -!) + ••• 


This shows the full Laurent series expansions near these points. Keep your eye on the coefficient of the inverse first 
power. That term alone plays a crucial role in what will follow. 
esc 3 z near z = 0: 


1 


1 


sin 3 z 


6 ' 120 J ^ IT 6 ' 120 

= 4[1 + xl" 3 = -4fl - 3x + 6x 2 - 10x 3 + 
Z 6L J z d L 
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1 — 3 I — — + Y- 
6 120 


i + v+z 4 (t-4i + 


6 120 


l + -z + 


17 

120 


£ 4 + 



(14.6) 


This has a third order pole, and the coefficient of l/z is 1/2. Are there any other singularities for this function? Yes, 
every place that the sine vanishes you have a pole: at 7nr. (What is the order of these other poles?) As I commented 
above, you’ll soon see that the coefficient of the l/z term plays a special role, and if that's all that you’re looking for 
you don’t have to work this hard. Now that you’ve seen what various terms do in this expansion, you can stop carrying 
along so many terms and still get the 1/2 z term. See problem 14.17 

The structure of a Laurent series is such that it will converge in an annulus. Examine the absolute convergence 
of such a series. 

oo —1 oo 

Y \ a k zk \ = Y \ a k zk \ + Y W zk \ 

— OO — OO 0 


The ratio test on the second sum is 


if for large enough positive k, 


Wk+i\\ z \ k+1 = l«fc+il 
|afc||-| fe \ a k\ 


(14.7) 


then the series converges. The smallest such x defines the upper bound of the \z\ for which the sum of positive powers 
converges. If |a fc+1 |/|a fc | has a limit then |z| max = lim |a fc |/|a fc+1 |. 

Do the same analysis for the series of negative powers, applying the ratio test. 


if for large enough negative k, you have 


l«fc-ilN fc 1 = lafc-il i 
\ak\\ z \ k \ a k\ \ z 


(14.8) 


then the series converges. The largest such x defines the lower bound of those \z\ for which the sum of negative powers 
converges. If |a*;-il/l a fcl has a limit as k — > — oo then |z| m ; n = lim \dk-i\/\ a k\- 
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If | | min < |^| max then there is a range of z for which the series converges absolutely (and so of course it converges). 



If either of these series of positive or negative powers is finite, terminating in a polynomial, then respectively |z| max = oo 
Or \z \ m in — 0. 

A major result is that when a function is analytic at a point (and so automatically in a neighborhood of that 
point) then it will have a Taylor series expansion there. The series will converge, and the series will converge to the given 
function. Is it possible for the Taylor series for a function to converge but not to converge to the expected function? Yes, 
for functions of a real variable it is. See problem 14.3. The important result is that for analytic functions of a complex 

variable this cannot happen, and all the manipulations that you would like to do will work. (Well, almost all.) 

14.4 Core Properties 

There are four closely intertwined facts about analytic functions. Each one implies the other three. For the term 
“neighborhood” of Zo, take it to mean all points satisfying \z — Zo\ < r for some positive r. 

1. The function has a single derivative in a neighborhood of Zq. 

2. The function has an infinite number of derivatives in a neighborhood of z o- 

3. The function has a power series (non-negative exponents) expansion about Zq and the series 
converges to the specified function in a disk centered at Zo and extending to the nearest 
singularity. You can compute the derivative of the function by differentiating the series term- 
by-term. 

4. All contour integrals of the function around closed paths in a neighborhood of Zo are zero. 

Item 3 is a special case of the result about Laurent series. There are no negative powers when the function is 

analytic at the expansion point. 

The second part of the statement, that it’s the presence of a singularity that stops the series from converging, 
requires some computation to prove. The key step in the proof is to show that when the series converges in the 
neighborhood of a point then you can differentiate term-by-term and get the right answer. Since you won’t have a 
derivative at a singularity, the series can't converge there. That important part of the proof is the one that I'll leave to 
every book on complex variables ever written. E.g. Schaum's outline on Complex Variables by Spiegel, mentioned in the 
bibliography. It’s not hard, but it requires attention to detail. 
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Instead of a direct approach to all these ideas, I’ll spend some time showing how they’re related to each other. 
The proofs that these are valid are not all that difficult, but I'm going to spend time on their applications instead. 

14.5 Branch Points 

The function f(z) = \fz has a peculiar behavior. You are so accustomed to it that you may not think of it as peculiar, 
but simply an annoyance that you have to watch out for. It’s double valued. The very definition of a function however 
says that a function is single valued, so what is this? I'll leave the answer to this until later, section 14.7, but for now I’ll 
say that when you encounter this problem you have to be careful of the path along which you move, in order to avoid 
going all the way around such a point. 

14.6 Cauchy’s Residue Theorem 

This is the fundamental result for applications in physics. If a function has a Laurent series expansion about the point 
Zq, the coefficient of the term 1 f{z — £o) is called the residue of / at Zo. The residue theorem tells you the value of a 
contour integral around a closed loop in terms of the residues of the function inside the loop. 

/ f(z) dz = 2m Res(/)| Zfc (14.9) 

J k 


To make sense of this result I have to specify the hypotheses. The direction of integration is counter-clockwise. Inside 
and on the simple closed curve defining the path of integration, / is analytic except at isolated points of singularity 
inside, where there is a Laurent series expansion. There are no branch points inside the curve. It says that at each 
singularity zj. inside the contour, find the residue; add them; the result (times 27ti) is the value of the integral on the 
left. The term “simple” closed curve means that it doesn't cross itself. 

Why is this theorem true? The result depends on some of the core properties of analytic functions, especially 
that fact that you can distort the contour of integration as long as you don't pass over a singularity. If there are several 
isolated singularities inside a contour (poles or essential singularities), you can contract the contour Ci to C 2 and then 
further to loops around the separate singularities. The parts of C 2 other than the loops are pairs of line segments that 
go in opposite directions so that the integrals along these pairs cancel each other. 
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The problem is now to evaluate the integrals around the separate singularities of the function, then add them. The 
function being integrated is analytic in the neighborhoods of the singularities (by assumption they are isolated). That 
means there is a Laurent series expansion around each, and that it converges all the way down to the singularity itself 
(though not at it). Now at the singularity z k you have 

/ OO 

Y a n {z-z k ) n 

n =— oo 


The lower limit may be finite, but that just makes it easier. In problem 14.1 you found that the integral of z n around 
counterclockwise about the origin is zero unless n = — 1 in which case it is 2ni. Integrating the individual terms of the 
series then gives zero from all terms but one, and then it is 27rfa_i, which is 27 vi times the residue of the function at 
z k . Add the results from all the singularities and you have the Residue theorem. 

Example 1 

The integral of 1 jz around a circle of radius R centered at the origin is 2ni. The Laurent series expansion of this 
function is trivial — it has only one term. This reproduces Eq. (14.4). It also says that the integral around the same 
path of e 1 / 2 is 2m. Write out the series expansion of e 1 / 2 to determine the coefficient of 1 / z. 

Example 2 

Another example. The integral of 1/ (z 2 — a 2 ) around a circle centered at the origin and of radius 2a. 

You can do this integral two ways. First increase the radius of the circle, pushing it out toward infinity. 

As there are no singularities along the way, the value of the integral is unchanged. The magnitude of 
the function goes as 1 / R 2 on a large ( R a) circle, and the circumference is 2nR. the product of 
these goes to zero as 1 / R, so the value of the original integral (unchanged, remember) is zero. 

Another way to do the integral is to use the residue theorem. There are two poles inside the contour, at ±a. Look 
at the behavior of the integrand near these two points. 



1 _ i 1 

z 2 — a 2 (z — a)(z + a) (z — a) (2a + z — a) 

1 

(z + a)(z + a — 2a) 


[near +a] 
[near —a] 


1 

2 a(z — a) 

1 

— 2a(z + a) 


The integral is 2iri times the sum of the two residues. 
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r*+oo 


e ikx dx 
a 4 + x 4 


(14.10) 


If these were squares instead of fourth powers, and it didn't have the exponential in it, you could easily find a trigonometric 
substitution to evaluate it. This integral would be formidable though. To illustrate the method, I'll start with that easier 
example, f dx/(a 2 + x 2 ). 

Example 3 

The function 1 /(a 2 + z 2 ) is singular when the denominator vanishes — when z = ±ia. The integral is the contour 
integral along the x-axis. 



dz 

a 2 + z 2 


Ci 

— 


4 


(14.11) 


The figure shows the two places at which the function has poles, ±ia. The method is to move the contour around and 
to take advantage of the theorems about contour integrals. First remember that as long as it doesn’t move across a 
singularity, you can distort a contour at will. I will push the contour Ci up, but I have to leave the endpoints where they 
are in order not let the contour cross the pole at ia. Those are my sole constraints. 



As I push the contour from Ci up to C2, nothing has changed, and the same applies to C3. The next two steps 
however, requires some comment. In C3 the two straight-line segments that parallel the y - axis are going in opposite 
directions, and as they are squeezed together, they cancel each other; they are integrals of the same function in reverse 
directions. In the final step, to C5, I pushed the contour all the way to +ioo and eliminated it. How does that happen? 
On a big circle of radius R, the function 1 /(a 2 + z 2 ) has a magnitude approximately 1 / R 2 . As you push the top curve 
in C4 out, forming a big circle, its length is nR. The product of these is 7 r/R, and that approaches zero as R — > 00 . 
All that is left is the single closed loop in C5, and I evaluate that with the residue theorem. 




= 27 ri Res ~ 

z=ia CL Z + 
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1 _ 1 _ 1 
a 2 + z 2 (z — ia)(z + ia) (z — ia)(2ia) 


Near the point z = ia the value of z + ia is nearly 2 ia, so the coefficient of 1 j{z — ia) is l/(2*a), and that is the 
residue. The integral is 27 ri times this residue, so 



dx 


1 

a 2 + x 2 


= 2ni ■ -r- 
2 ia 


7 r 
a 


(14.12) 


The most obvious check on this result is that it has the correct dimensions, [dz/z 2 ] = L/L 2 = l/L, a reciprocal length 
(assuming a is a length). What happens if you push the contour down instead of up? See problem 14.10 


Example 4 

How about the more complicated integral, Eq. (14.10)? There are more poles, so that's where to start. The denominator 
vanishes where z 4 = —a 4 , or at 

^ = a(e i7V+2inn ) 1/4 = ae i7r / 4 e imr / 2 



e lkz dz 
a 4 + z 4 


X 

X Cl 


X 

X 


I’m going to use the same method as before, pushing the contour past some poles, but I have to be a bit more careful 
this time. The exponential, not the l/z 4 , will play the dominant role in the behavior at infinity. If k is positive then if 
z = iy, the exponential e* ky = e~ ky — > 0 as y — > +oo. It will blow up in the — zoo direction. Of course if k is negative 
the reverse holds. 

Assume k > 0, then in order to push the contour into a region where I can determine that the integral along it is 
zero, I have to push it toward +ioo. That's where the exponential drops rapidly to zero. It goes to zero faster than any 
inverse power of y, so even with the length of the contour going as nR, the combination vanishes. 
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As before, when you push Ci up to C2 and to C3, nothing has changed, because the contour has crossed no 
singularities. The transition to C4 happens because the pairs of straight line segments cancel when they are pushed 
together and made to coincide. The large contour is pushed to +ioo where the negative exponential kills it. All that's 
left is the sum over the two residues at ae l7r / 4 and ae 3i7r / 4 . 


The denominator factors as 




£ ikz 

2iri T Res — ; r 

^ a 4 + z 4 


a 4 + z 4 = (z - ae t7r / 4 )(z - ae 3i7r / 4 )(z - ae 5t7r / 4 )(z - ae 7 * 71 "/ 4 ) 

The residue at ae* 77 / 4 = a(l + i)/y/2 is the coefficient of l/(z — ae* 77 / 4 ), so it is 


gi/ca(l+i)/y/ 2 

(ae* 77 / 4 — ae 3 * 77 / 4 )(ae* 77 / 4 — ae 5 * 77 / 4 )(ae* 77 / 4 — ae 7 * 77 / 4 ) 


1 



Do you have to do a lot of algebra to evaluate this denominator? Maybe you will prefer that to the alternative: draw a 
picture. The distance from the center to a corner of the square is a, so each side has length ay/2. The first factor in 
the denominator of the residue is the line labeled "1” in the figure, so it is ay/ 2. Similarly the second and third factors 
(labeled in the diagram) are 2a(l +i)/y/ 2 and iay/ 2. This residue is then 


Res 

e 27r/4 


gifca ( l + i )/\/2 

(a v / 2)(2a(l + i) / y/i){iay/2) 


gifca ( l + i )/\/2 
a 3 2y/2(—l + i) 


For the other pole, at e 3 * 77 / 4 , the result is 


Res 

e 327r/4 


gifca (- l +*)/\/2 

(-a v / 2)(2a(-l + i)/y/2)(iay/2) 


g * fca (— l + i )/\/2 
a 3 2y/2(l + i) 


(14.13) 


(14.14) 


The final result for the integral Eq. (14.10) is then the sum of these (x27tf) 


L 


+00 pikxUrp jnp-ka/V^ 

- 4—4 = 2ttz [(14.13) + (14.14)] = ——3 cos[(ka/y/2) - tt/4] 

Cl ~ 1 Jy CL 


(14.15) 
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This would be a challenge to do by other means, without using contour integration. It is probably possible, but would be 
much harder. Does the result make any sense? The dimensions work, because the [dz/z 4 ] is the same as l/a I * 3 . What 
happens in the original integral if k changes to —hi It’s even in k of course. (Really? Why?) This result doesn’t look 
even in k but then it doesn't have to because it applies only for the case that k > 0. If you have a negative k you can 
do the integral again (problem 14.42) and verify that it is even. 

Example 5 

Another example for which it's not immediately obvious how to use the residue theorem: 



, sm ax 

ax 

x 


Ci 


_r\. 



(14.16) 


This function has no singularities. The sine doesn’t, and the only place the integrand could have one is at zero. Near 
that point, the sine itself is linear in x, so (sinaa;)/a; is finite at the origin. The trick in using the residue theorem here 
is to create a singularity where there is none. Write the sine as a combination of exponentials, then the contour integral 
along Ci is the same as along C 2 , and 




giaz g—iaz 

2 iz 




g—iaz 
2 iz 


I had to move the contour away from the origin in anticipation of this splitting of the integrand because I don’t want to 

try integrating through this singularity that appears in the last two integrals. In the first form it doesn't matter because 
there is no singularity at the origin and the contour can move anywhere I want as long as the two points at ±00 stay 

put. In the final two separated integrals it matters very much. 



Assume that a > 0. In this case, e iaz — > 0 as z — > +zoo. For the other exponential, it vanishes toward — zoo. 
This implies that I can push the contour in the first integral toward +zoo and the integral over the contour at infinity 
will vanish. As there are no singularities in the way, that means that the first integral is zero. For the second integral 
you have to push the contour toward —zoo, and that hangs up on the pole at the origin. That integral is then 



g—iaz 

2iz 



g—iaz 
2 iz 


g—iaz 

— (—2m) Res — ; — = 7 r 
v ' 2iz 
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The factor — 2iti in front of the residue occurs because the integral is over a clockwise contour, thereby changing its 
sign. Compare the result of problem 5.29(b). 

Notice that the result is independent of a > 0. (And what if a < 0?) You can check this fact by going to the 
original integral, Eq. (14.16), and making a change of variables. See problem 14.16. 

Example 6 

What is jg 00 dx/{a 2 + x 2 ) 2 7 The first observation I’ll make is that by dimensional analysis alone, I expect the result to 
vary as 1 /a 3 . Next: the integrand is even, so using the same methods as in the previous examples, extend the integration 
limits to the whole axis (times 1 / 2 ). 


1 f dz 

2 7 ci (« 2 + ~ 2 ) 2 


Ci 




As with Eq. (14.11), push the contour up and it is caught on the pole at z = ia. That’s curve C 5 following that equation. 
This time however, the pole is second order, so it take a (little) more work to evaluate the residue. 


1 


1 


1 


1 


1 


1 


2 (a 2 + z 2 ) 2 2 (z — ia) 2 (z + ia) 2 2 (z - ia) 2 (z — ia + 2ia) 2 

_ 1 1 

2 (z — fa) 2 ( 2 fa) 2 [l + (z — ia)/2ia] 2 
1 1 r 1 _ 2 (g-ia) 


2 (z — ia) 2 (2ia) 2 

11 1 

+ o(— 2 ) 


2 ia 


1 


2 (z — ia) 2 (2ia) 2 2 y (z — ia)(2ia) 3 

The residue is the coefficient of the 1 j{z — ia) term, so the integral is 


2\2 


dx/(a 2 + x 2 ) 


2ni -(—I) ■ 


1 

(2fa) 3 


7T 

4a 3 


Is this plausible? The dimensions came out as expected, and to estimate the size of the coefficient, 7t/4, look back at the 
result Eq. (14.12). Set a = 1 and compare the 7 t there to the 7 t /4 here. The range of integration is half as big, so that 
accounts for a factor of two. The integrands are always less than one, so in the second case, where the denominator is 
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squared, the integrand is always less than that of Eq. (14.12). The integral must be less, and it is. Why less by a factor 
of two? Dunno, but plot a few points and sketch a graph to see if you believe it. (Or use parametric differentiation to 
relate the two.) 


Example 7 

A trigonometric integral: d6 / (a + bcosO). The first observation is that unless |a| > |6| then this denominator will 

go to zero somewhere in the range of integration (assuming that a and b are real). Next, the result can't depend on the 
relative sign of a and b, because the change of variables O' = 6 + 7T changes the coefficient of b while the periodicity of 
the cosine means that you can leave the limits alone. I may as well assume that a and b are positive. The trick now is 
to use Euler’s formula and express the cosine in terms of exponentials. 


Let z = e 


iO 


then 


cos 6 = - 
2 


1 

z + - 
z 


and 


dz = i e^dd = iz d6 


As 6 goes from 0 to 27 T, the complex variable 0 goes around the unit circle. The integral is then 


r»27T 


dd 


1 


dz 


1 


(a + 6 cos#) J c iz a + b(z + \) /2 


The integrand obviously has some poles, so the first task is to locate them. 


2 az + bz 2 + b = 0 


has roots 


z = 


-2a ± \J (2a) 2 - 4 b 2 
2b 


z± 


Because a > b, the roots are real. The important question is: Are they inside or outside the unit circle? The roots 
depend on the ratio a/b = A. 



(14.17) 


As A varies from 1 to oo, the two roots travel from —1 — > — oo and from —1 — > 0, so z + stays inside the unit circle 
(problem 14.19). The integral is then 


2 if dz 2 if dz 2 i 

b Jc z 2 + 2\z + 1 b J c (z - z + )(z - Z-) b 711 z=z+ 

2i . 1 27T 27t 

= = — , = , 

b z+-z- bs/W^l Va 2 - b 2 
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14.7 Branch Points 

Before looking at any more uses of the residue theorem, I have to return to the subject of branch points. They are 
another type of singularity that an analytic function can have after poles and essential singularities, yfz provides the 
prototype. 

The definition of the word function, as in section 12.1, requires that it be single-valued. The function \fz stubbornly 
refuses to conform to this. You can get around this in several ways: First, ignore it. Second, change the definition of 
“function” to allow it to be multiple-valued. Third, change the domain of the function. 

You know I’m not going to ignore it. Changing the definition is not very fruitful. The third method was pioneered 
by Riemann and is the right way to go. 

The complex plane provides a geometric picture of complex numbers, but when you try to handle square roots 
it becomes a hindrance. It isn’t adequate for the task. There are several ways to develop the proper extension, and I'll 
show a couple of them. The first is a sort of algebraic way, and the second is a geometric interpretation of the first way. 
There are other, even more general methods, leading into the theory of Riemann Surfaces and their topological structure, 
but I won't go into those. 

Pick a base point, say Zq = 1, from which to start. This will be a kind of fiduciary point near which I know the 
values of the function. Every other point needs to be referred to this base point. If I state that the square root of zq is 
one, then I haven't run into trouble yet. Take another point z = re iS and try to figure out the square root there. 

y/z = \/ re ld = \fr I 2 or yfz = s/ re i ^ +2n '> = \fr e ie ! 2 e m 



■ In the picture, 0 appears to be at about 1 5 e °.6i Qr SQ 

■ On the path labeled 0, the angle 6 starts at zero at Zq and increases to 0.6 radians, so y/r e*^/ 2 varies continuously 
from 1 to about 1.25e 0 ' 3 *. 

■ On the path labeled 1, angle 9 again starts at zero and increases to 0.6 + 27t, so yfr e*®/ 2 varies continuously from 1 

to about 1.25e( 7r+0-3 )*, which is minus the result along path #0. 

■ On the path labeled 2, angle 6 goes from zero to 0.6 + 47T, and y/re lS / 2 varies from 1 to 1.25e( 27r+0 - 3 ^ and that is 

back to the same value as path #0. 

■ For the path labeled —3, the angle is 0.6 — 67t, resulting in the same value as path #1. 
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There are two classes of paths from Zo to z, those that go around the origin an even number 
of times and those that go around an odd number of times. The "winding number” w is the name 
given to the number of times that a closed loop goes counterclockwise around a point (positive 
or negative), and if I take the path #1 and move it slightly so that it passes through Zq, you can 
more easily see that the only difference between paths 0 and 1 is the single loop around the origin. 

The value for the square root depends on two variables, z and the winding number of the path. 

Actually less than this, because it depends only on whether the winding number is even or odd: 

yfz y/(z,w). 

In this notation then Zo — > (*o,0) is the base point, and the square root of that is one. The square root of 
(zo, 1) is then minus one. Because the sole relevant question about the winding number is whether it is even or odd, it’s 
convenient simply to say that the second argument can take on the values either 0 or 1 and be done with it. 


Geometry of Branch Points 

How do you picture such a structure? There’s a convenient artifice that lets you picture and manipulate functions with 
branch points. In this square root example, picture two sheets and slice both along some curve starting at the origin and 
going to infinity. As it's a matter of convenience how you draw the cut I may as well make it a straight line along the 
x-axis, but any other line (or simple curve) from the origin will do. As these are mathematical planes I’ll use mathematical 
scissors, which have the elegant property that as I cut starting from infinity on the right and proceeding down to the 
origin, the points that are actually on the x-axis are placed on the right side of the cut and the left side of the cut is left 
open. Indicate this with solid and dashed lines in the figure. (This is not an important point; don't worry about it.) 


a 

b 

0 



Now sew the sheets together along these cuts. Specifically, sew the top edge from sheet ^0 to the bottom edge 
from sheet #1. I then sew the bottom edge of sheet #0 to the top edge of sheet #1. This sort of structure is called a 
Riemann surface. How to do this? Do it the same way that you read a map in an atlas of maps. If page 38 of the atlas 
shows a map with the outline of Brazil and page 27 shows a map with the outline of Bolivia, you can flip back and forth 
between the two pages and understand that the two maps* represent countries that are touching each other along their 
common border. 


* www.worldatlas.com 
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You can see where they fit even though the two countries are not drawn to the same scale. Brazil is a whole lot 
larger than Bolivia, but where the images fit along the Western border of Brazil and the Eastern border of Bolivia is 
clear. You are accustomed to doing this with maps, understanding that the right edge of the map on page 27 is the 
same as the left edge of the map on page 38 — you probably take it for granted. Now you get to do it with Riemann 
surfaces. 

You have two cut planes (two maps), and certain edges are understood to be identified as identical, just as two 
borders of a geographic map are understood to represent the same line on the surface of the Earth. Unlike the maps 
above, you will usually draw both to the same scale, but you won't make the cut ragged (no pinking shears) so you need 
to use some notation to indicate what is attached to what. That's what the letters a and b are. Side a is the same 
as side a. The same for b. When you have more complicated surfaces, arising from more complicated functions of the 
complex variable with many branch points, you will have a fine time sorting out the shape of the surface. 
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I drew three large disks on this Riemann surface. One is entirely within the first sheet (the first map); a second is 
entirely within the second sheet. The third disk straddles the two, but is is nonetheless a disk. On a political map this 
might be disputed territory. Going back to the original square root example, I also indicated the initial point at which 
to define the value of the square root, (zo,0), and because a single dot would really be invisible I made it a little disk, 
which necessarily extends across both sheets. 

Here is a picture of a closed loop on this surface. I’ll probably not ask you to do contour integrals along such 
curves though. 




Other Functions 

Cube Root Take the next simple step. What about the cube root? Answer: Do exactly the same thing, except that 
you need three sheets to describe the whole . Again, I’ll draw a closed loop. As long as you have a single branch point 
it's no more complicated than this. 



Logarithm How about a logarithm? Inz = In (re*^) = In r + i6. There’s a branch point at the origin, but this time, as 
the angle keeps increasing you never come back to a previous value. This requires an infinite number of sheets. That 
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number isn’t any more difficult to handle — it’s just like two, only bigger. In this case the whole winding number around 
the origin comes into play because every loop around the origin, taking you to the next sheet of the surface, adds another 
27 mu, and w is any integer from — oo to +oo. The picture of the surface is like that for the cube root, but with infinitely 
many sheets instead of three. The complications start to come when you have several branch points. 

Two Square Roots Take \J z 1 — 1 for an example. Many other functions will do just as well. Pick a base point Zo\ I'll 
take 2. (Not two base points, the number 2.) f(zo,0) = y/3. Now follow the function around some loops. This repeats 
the development as for the single branch, but the number of possible paths will be larger. Draw a closed loop starting 
at z Q . 




Despite the two square roots, you still need only two sheets to map out this surface. I drew the ab and cd cuts 
below to keep them out of the way, but they're very flexible. Start the base point and follow the path around the point 
+1; that takes you to the second sheet. You already know that if you go around +1 again it takes you back to where 
you started, so explore a different path: go around —1. Now observe that this function is the product of two square 
roots. Going around the first one introduced a factor of —1 into the function and going around the second branch point 
will introduce a second identical factor. As (— l) 2 = +1, then when you you return to Zo the function is back at \/3, 
you have returned to the base point and this whole loop is closed. If this were the sum of two square roots instead of 
their product, this wouldn't work. You'll need four sheets to map that surface. See problem 14.22. 

These cuts are rather awkward, and now that I know the general shape of the surface it’s possible to arrange the 
maps into a more orderly atlas. Here are two better ways to draw the maps. They're much easier to work with. 
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a 


c 


e 




0 


/ 


or 


b 


d 


f 


1 a 


c 


1 


e 


I used the dash-dot line to indicate the cuts. In the right pair, the base point is on the right-hand solid line of 
sheet # 0. In the left pair, the base point is on the c part of sheet #0. See problem 14.20. 

14.8 Other Integrals 

There are many more integrals that you can do using the residue theorem, and some of these involve branch points. In 
some cases, the integrand you're trying to integrate has a branch point already built into it. In other cases you can pull 
some tricks and artificially introduce a branch point to facilitate the integration. That doesn't sound likely, but it can 


happen. 


Example 8 

The integral f^° dxx/ (a+x) 3 . You can do this by elementary methods (very easily in fact), but I’ll use it to demonstrate 


a contour method. This integral is from zero to infinity and it isn’t even, so the previous tricks don't seem to apply. 
Instead, consider the integral (a > 0) 



and you see that right away, I’m creating a branch point where there wasn’t one before. 



The fact that the logarithm goes to infinity at the origin doesn’t matter because it is such a weak singularity that 


any positive power, even x°-000i times the logarithm, gives a finite limit as x — > 0. Take advantage of the branch point 
that this integrand provides. 
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On Ci the logarithm is real. After the contour is pushed into position C 2 , there are several distinct pieces. A part of C 2 

is a large arc that I can take to be a circle of radius R. The size of the integrand is only as big as (In R)/R 2 , and when 

I multiply this by 2 nR, the circumference of the arc, it will go to zero as R — > 00 . 

The next pieces of C 2 to examine are the two straight lines between the origin and —a. The integrals along here are in 

opposite directions, and there's no branch point intervening, so these two segments simply cancel each other. 

What's left is C3. 


dx In a; 


x 


(1 a + x ) 3 



= 2 m Res 

z=—a 


dx ( In x + 27 n) 


x 


(1 a + a:) 3 



C 3 


3ST 


Below the positive real axis, that is, below the cut, the logarithm differs from its original value by the constant 2 tt i. 
Among all these integrals, the integral with the logarithm on the left side of the equation appears on the right side too. 
These terms cancel and you’re left with 


0 = 27 n Res + 

z=—a 


dx 2 m- 


x 


or 


dx 


x 


= — Res In z- 


(■ a + x ) 3 J o ( a + x ) 3 z=-a ( a + z ) 3 

This is a third-order pole, so it takes a bit of work. First expand the log around —a. Here it’s probably easiest to plug 
into Taylor's formula for the power series and compute the derivatives of In z at —a. 


, , , x , . 1 (z + a ) 2 -1 

ln^ - ln(-a) + {z + a)—+ 2 , ( _ a)2 


+ 


Which value of ln(— a) to take? That answer is dictated by how I arrived at the point —a when I pushed the contour 
from Ci to C 2 ■ That is, lna + 77T. 


— In z 


(a + z) 3 


1 • 1 , \ 1 (z + a ) 2 

In a + zn (z + a) 3- F 

a a 2 2 


(z + a) — a 


1 


(z + a) ; 


I’m interested solely in the residue, so look only for the coefficient of the power l/(z + a ). That is 


1 1 r N 

a 2 a 2 ( a) 


1 

2 a 


Did you have to do all this work to get this answer? Absolutely not. This falls under the classic heading of using a 
sledgehammer as a fly swatter. It does show the technique though, and in the process I had an excuse to show that 
third-order poles needn't be all that intimidating. 
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14.9 Other Results 

Polynomials: There are some other consequences of looking in the complex plane that are very different from any of the 
preceding. If you did problem 3.11, you realize that the function e z = 0 has no solutions, even in the complex plane. You 
are used to finding roots of equations such as quadratics and maybe you’ve encountered the cubic formula too. How do 
you know that every polynomial even has a root? Maybe there's an order-137 polynomial that has none. No, it doesn't 
happen. That every polynomial has a root (n of them in fact) is the Fundamental Theorem of Algebra. Gauss proved 
it, but after the advent of complex variable theory it becomes an elementary exercise. 

A polynomial is f(z) = a n z n + a n -iz n ~ 1 + • • • + ao- Consider the integral 



f'(z) 

m 


around a large circle. f'(z ) = na n z n 1 + • • -, so this is 



na n z n 1 + (n - l)a n -iz n 2 H 

a n z n + dn-iz 11 - 1 H F a 0 



n 1 + 


(n— l)a ra 
na n z 


Z 


i + 


Qn- 1 
a n z 


+ ■■■ 


Take the radius of the circle large enough that only the first term in the numerator and the first term in the denominator 
are important. That makes the integral 


dz 


n 

z 


= 27 vin 


It is certainly not zero, so that means that there is a pole inside the loop, and so a root of the denominator. 

Function Determined by its Boundary Values: If a function is analytic throughout a simply connected domain and C 
is a simple closed curve in this domain, then the values of / inside C are determined by the values of / on C. Let 0 be 
a point inside the contour then I will show 


1 

2ni 



m 


(14.18) 


Because / is analytic in this domain I can shrink the contour to be an arbitrarily small curve Ci around z, and because 
/ is continuous, I can make the curve close enough to z that f(z') = f{z) to any desired accuracy. That implies that 
the above integral is the same as 



= m 
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Eq. (14.18) is Cauchy’s integral formula, giving the analytic function in terms of its boundary values. 
Derivatives: You can differentiate Cauchy's formula any number of times. 


d n f{z) n\ f f(z') 
dz n 27 n J c ^ (. z 1 — z) n+1 


(14.19) 


Entire Functions: An entire function is one that has no singularities anywhere. e z , polynomials, sines, cosines are such. 
There's a curious and sometimes useful result about such functions. A bounded entire function is necessarily a constant. 
For a proof, take two points, z\ and Z2 and apply Cauchy's integral theorem, Eq. (14.18). 


f{zi) ~ f(z 2 ) = -^r- [ dz f(z') 
J c 



1 

27 ri 



z i - z 2 

- Z!)(z' - Z 2 ) 


By assumption, / is bounded, \f(z)\ < M . A basic property of complex numbers is that | u + < \u\ + |tr| for any 

complex numbers u and v. This means that in the defining sum for an integral, 


E /(Cfc)Azfc 


< 


J2\f(Ck)\\^z k i 


so 


/ (z)dz 


< / \f{z)\\dz\ 


(14.20) 


Apply this. 


\f(zi) - f(z 2 ) | < 


\dz\\f{z')\ 


z i - z 2 

(z' - ZiXz' - z 2 ) 


< M\z\ - z 2 



1 

(z‘ - Z!){Z' - Z 2 ) 


On a big enough circle of radius R, this becomes 

\f{zi) - f(z 2 ) | < M\zi - z 2 \2ttR^ — > 0 as R ->• oo 

The left side doesn’t depend on R, so f(z\) = f(z 2 ). 
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Exercises 

1 Describe the shape of the function e z of the complex variable z. That is, where in the complex plane is this function 
big? small? oscillating? what is its phase? Make crude sketches to help explain how it behaves as you move toward 
infinity in many and varied directions. Indicate not only the magnitude, but something about the phase. Perhaps words 
can be better than pictures? 

2 Same as the preceding but for e lz . 

3 Describe the shape of the function z 2 . Not just magnitude, other pertinent properties too such as phase, so you know 
how it behaves. 

4 Describe the shape of i/z. 

5 Describe the shape of 1 /(a 2 + z 2 ). Here you need to show what it does for large distances from the origin and for 
small. Also near the singularities. 

• 2 

6 Describe the shape of e lz . 

7 Describe the shape of cosz. 

8 Describe the shape of e ikz where k is a real parameter, — oo < k < oo. 

9 Describe the shape of the Bessel function Jq{z). Look up for example Abramowitz and Stegun chapter 9, sections 1 

and 6. (I 0 (z) = J 0 (iz)) 
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Problems 

14.1 Explicitly integrate z n dz around the circle of radius R centered at the origin, just as in Eq. (14.4). The number 
n is any positive, negative, or zero integer. 

14.2 Repeat the analysis of Eq. (14.3) but change it to the integral of z*dz. 

14.3 For the real-valued function of a real variable, 

f( x ) = / e" l/a;2 (x + 0) 

/U (0 (x = 0) 

Work out all the derivatives at x = 0 and so find the Taylor series expansion about zero. Does it converge? Does it 
converge to /? You did draw a careful graph didn’t you? Perhaps even put in some numbers for moderately small x. 

14.4 (a) The function 1 / (z — a) has a singularity (pole) at z = a. Assume that \z\ < |a|, and write its series expansion 
in powers of z/a. Next assume that |z| > |a| and write the series expansion in powers of ajz. 

(b) In both cases, determine the set of z for which the series is absolutely convergent, replacing each term by its absolute 
value. Also sketch these sets. 

(c) Does your series expansion in a/z imply that this function has an essential singularity at z = 0? Since you know 
that it doesn’t, what happened? 

14.5 The function 1/(1 + z 2 ) has a singularity at z = i. Write a Laurent series expansion about that point. To do so, 
note that 1 + z 2 = (z — i)(z + i) = (z — i)(2i + z — i) and use the binomial expansion to produce the desired series. (Or 
you can find another, more difficult method.) Use the ratio test to determine the domain of convergence of this series. 
Specifically, look for (and sketch) the set of z for which the absolute values of the terms form a convergent series. 

Ans: \z — i\ <2 OR \z — i | >2 depending on which way you did the expansion. If you did one, find the other. If you 
expanded in powers of ( z — i ), try expanding in powers of 1 /{z — i). 

14.6 What is // dz/(l — z 2 )? Ans: Z7t/4 

14.7 (a) What is a Laurent series expansion about z = 0 with \z\ < 1 to at least four terms for 

sin z/ z 4 


e z /z 2 (l - z) 
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(b) What is the residue at z = 0 for each function? 

(c) Then assume \z\ > 1 and find the Laurent series. 

Ans: |z| > 1: where f(n) = — e if n < — 3 and f(n) 


E“+s 1 /k\ if n > -3. 


14.8 By explicit integration, evaluate the integrals around the counterclockwise loops: 



14.9 Evaluate the integral along the straight line from a to a + ioo: f e lz dz. Take a to be real. Ans: ie ia 

14.10 (a) Repeat the contour integral Eq. (14.11), but this time push the contour down, not up. 

(b) What happens to the same integral if a is negative? And be sure to explain your answer in terms of the contour 
integrals, even if you see an easier way to do it. 


14.11 Carry out all the missing steps starting with Eq. (14.10) and leading to Eq. (14.15). 

14.12 Sketch a graph of Eq. (14.15) and for k < 0 too. What is the behavior of this function in the neighborhood of 
k = 0? (Careful!) 

14.13 In the integration of Eq. (14.16) the contour C 2 had a bump into the upper half-plane. What happens if the 
bump is into the lower half-plane? 

14.14 For the function in problem 14.7, e z /z 2 (l — z ), do the Laurent series expansion about z = 0, but this time 
assume \z\ > 1. What is the coefficient of 1 /z now ? You should have no trouble summing the series that you get for 
this. Now explain why this result is as it is. Perhaps review problem 14.1. 


14.15 In the integration of Eq. (14.16) the contour C 2 had a bump into the upper half- 
plane, but the original function had no singularity at the origin, so you can instead start 
with this curve and carry out the analysis. What answer do you get? 
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14.16 Use contour integration to evaluate Eq. (14.16) for the case that a < 0. 

Next, independently of this, make a change of variables in the original integral Eq. (14.16) in order to see if the answer 
is independent of a. In this part, consider two cases, a > 0 and a < 0. 

14.17 Recalculate the residue done in Eq. (14.6), but economize your labor. If all that all you really want is the coefficient 
of l/z, keep only the terms that you need in order to get it. 

14.18 What is the order of all the other poles of the function csc 3 z, and what is the residue at each pole? 

14.19 Verify the location of the roots of Eq. (14.17). 

14.20 Verify that the Riemann surfaces work as defined for the function \/ z 2 — 1 using the alternative maps in section 
14.7. 

14.21 Map out the Riemann surface for yj z(z — l)(z — 2). You will need four sheets. 

14.22 Map out the Riemann surface for yfz + y/ z — 1. You will need four sheets. 

14.23 Evaluate 



where C is a circle of radius R about the origin. 

14.24 Evaluate 

/ dz tan z 

J c 

where C is a circle of radius nn about the origin. Ans: —47 Tin 

14.25 Evaluate the residues of these functions at their singularities, a, b, and c are distinct. Six answers: you should 
be able to do five of them in your head. 




1 
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14.26 Evaluate the residue at the origin for the function 



The result will be an infinite series, though if you want to express the answer in terms of a standard function you will 
have to hunt. Ans: Iq(2) = 2.2796, a modified Bessel function. 

14.27 Evaluate dz/ (a 4 + x A ), and to check, compare it to the result of Eq. (14.15). 


14.28 Show that 



cos bx 
a 2 + x 2 


]L P ~ab 
2 a 


(a, b > 0) 


14.29 Evaluate (a real) 

Ans: \a\7T 

14.30 Evaluate 


dx 


sin ax 


x z 


sin 2 bx 

L X x(a 2 +x 2 ) 


14.31 Evaluate the integral J 0 °° dx y/x / (a + x) 2 . Use the ideas of example 8, but without the logarithm, (a > 0) 
Ans: 7i/2y/a 


14.32 Evaluate 


(What happens if you consider (lnx) 2 ?) 



Ans: (7rlna)/2a 


In a; 

a 2 + x 2 


14.33 Evaluate (A > 1) by contour integration 


f»27T 


d6 


I o (A + sin0)' 
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Ans: 27 tA/(A 2 — l) 3 / 2 


14.34 Evaluate 


Recall Eq. (2.19). Ans: n 2 nC n /2 2n 1 



dO sin 2n 0 


n(2n — l)!!/(2n)!! 


14.35 Evaluate the integral of problem 14.33 another way. Assume A is large and expand the integrand in a power series 
in l/A. Use the result of the preceding problem to evaluate the individual terms and then sum the resulting infinite 
series. Will section 1.2 save you any work? Ans: Still 27tA/(A 2 — l) 3 / 2 


14.36 Evaluate 


/*oo /*oo 

/ dx cosax 2 and / dx sin ax 2 by considering 

Jo Jo 

Push the contour of integration toward the 45° line. Ans: \\jTt / 2a 

14.37 

f( 1 1 

J[ ~ ) z(z-l)(z-2) z 2 (z-l) 2 (z-2) 2 

What is f c dz f(z) about the circle x 2 + y 2 = 9? 

14.38 Derive 

roc ^ 

/ dx— 5 ^ = 27tv / 3/9 a 2 

Jo a 3 + x 3 


dxe 1 


14.39 Go back to problem 3.45 and find the branch points of the inverse sine function. 

14.40 What is the Laurent series expansion of 1/(1 + z 2 ) for small \z\l Again, for large \z\? What is the domain of 
convergence in each case? 

14.41 Examine the power series z n ' . What is its radius of convergence? What is its behavior as you move out from 
the origin along a radius at a rational angle? That is, 0 = re* 7rp /' ? for p and q integers. This result is called a natural 
boundary. 
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14.42 Evaluate the integral Eq. (14.10) for the case k < 0. Combine this with the result in Eq. (14.15) and determine 
if the overall function is even or odd in k (or neither). 

14.43 At the end of section 14.1 several differentiation formulas are mentioned. Derive them. 

14.44 Look at the criteria for the annulus of convergence of a Laurent series, Eqs. (14.7) and (14.8), and write down 
an example of a Laurent series that converges nowhere. 

14.45 Verify the integral of example 8 using elementary methods. It will probably take at least three lines to do. 

14.46 What is the power series representation for f(z) = \fz about the point 1 + 2 ? What is the radius of convergence 
of this series? In the Riemann surface for this function as described in section 14.7, show the disk of convergence. 


AnaK 


ourier Analysis 


Fourier series allow you to expand a function on a finite interval as an infinite series of trigonometric functions. What if 
the interval is infinite? That's the subject of this chapter. Instead of a sum over frequencies, you will have an integral. 

15.1 Fourier Transform 

For the finite interval you have to specify the boundary conditions in order to determine the particular basis that you're 
going to use. On the infinite interval you don't have this large set of choices. After all, if the boundary is infinitely far 
away, how can it affect what you're doing over a finite distance? But see section 15.6. 

In section 5.3 you have several boundary condition listed that you can use on the differential equation u" = Xu and 
that will lead to orthogonal functions on your interval. For the purposes here the easiest approach is to assume periodic 
boundary conditions on the finite interval and then to take the limit as the length of the interval approaches infinity. On 
— L < x < +L, the conditions on the solutions of u" = Xu are then u(—L) = u(+L ) and u'(—L) = u'i+L). The 
solution to this is most conveniently expressed as a complex exponential, Eq. (5.19) 

u(x ) = e ikx , where u(—L) = e~ lkL = u(L) = e lkL 

This implies e 2ikL = 1, or 2 kL = 2nir, for integer n = 0, ±1, ±2,.... With these solutions, the other condition, 
u'(—L) = u'i+L ) is already satisfied. The basis functions are then 

M n (x) = e lknX = e n7m: / L , for n = 0, ±1, ±2, etc. (15.1) 


On this interval you have the Fourier series expansion 


OO OO 

fix') = Un{x), and ( %, /) = ( U m , ^ ^ a, n Un) = %) 

— OO — OO 

In the basis of Eq. (15.1) this normalization is (u m ,u m ') = 2 L. 

Insert this into the series for /. 


OO 

m= E 

n =— oo 


(Unj) 
(Uni ^n) 


Unix ) 


1 

2 L 


OO 

< U n ,f)Unix ) 

n =— oo 


(15.2) 
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Now I have to express this in terms of the explicit basis functions in order to manipulate it. When you use the explicit 
form you have to be careful not to use the same symbol ( x ) for two different things in the same expression. Inside the 
(u n , /} there is no “x" left over — it’s the dummy variable of integration and it is not the same x that is in the u n (x ) 
at the end. Denote k n = 7 rn/L. 


f(x) 


1 

2 L 



dx'u n (x')*f{x')u n (x) 



e ik n x 


Now for some manipulation: As n changes by 1, k n changes by A k n = 7 t/L. 

1 _°°. 7 r fL 

= dx'e~ iknX ' f(x')e iknX 

71 n=-oo 

1 oo rL 

= ^Y, eiknXAk n dx' e- lknX ' fix' ) (15.3) 

n n=—oo 


For a given value of k, define the integral 


g L (k) = / dx'e lkx ' f{x') 

J — L 

If the function / vanishes sufficiently fast as x' — y oo, this integral will have a limit as L — y oo. Call that limit g{k). 
Look back at Eq. (15.3) and you see that for large L the last factor will be approximately g{k n ), where the approximation 
becomes exact as L — y oo. Rewrite that expression as 

1 OO 

/(z)«— J2 e iknX Ak n g{k n ) (15.4) 

n=— oo 


As L — y oo, you have A k n — > 0, and that turns Eq. (15.4) into an integral. 


/ °o r ]L, roo 

— e lkx g{k), where g(k) = / dxe~ lkx f{x ) 

27T 


(15.5) 
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The function g is called* the Fourier transform of /, and / is the inverse Fourier transform of g. 

Examples 

For an example, take the step function 




(—a < x < a) 

(elsewhere) 

dxe~ ikx 1 



e ~ika _ e +ika 


then 


2 sin ka 

k 


(15.6) 


The first observation is of course that the dimensions check: If dx is a length then so is 1 jk. After that, there 
is only one parameter that you can vary, and that’s a. As a increases, obviously the width of the function / increases, 
but now look at g. The first place where g(k) = 0 is at ka = it. This value, n / a decreases as a increases. As / gets 
broader, g gets narrower (and taller). This is a general property of these Fourier transform pairs. 

Can you invert this Fourier transform, evaluating the integral of g to get back to /? Yes, using the method of 
contour integration this is very easy. Without contour integration it would be extremely difficult, and that is typically 
the case with these transforms; complex variable methods are essential to get anywhere with them. The same statement 
holds with many other transforms (Laplace, Radon, Mellin, Hilbert, etc. ) 

The inverse transform is 



dk 2 sin ka 
2n 6 ~~k 



rile . P ika _ p—ika 

Uirli Cl Cl 

27t ik 


/' dk 1 

^ ik{x+a ) _ e ik(x-a) 

/c 2 2?r k 

- 


Ci 


-TTV 


C 2 




1. If x > +a then both x + a and x — a are positive, which implies that both exponentials vanish rapidly as k — > +ioo. 
Push the contour C 2 toward this direction and the integrand vanishes exponentially, making the integral zero. 

2. If —a < x < +a, then only x + a is positive. The integral of the first term is then zero by exactly the preceding 

* Another common notation is to define g with an integral dx/V 2n. That will require a corresponding dk/y/ 2n 
in the inverse relation. It's more symmetric that way, but I prefer the other convention. 



495 


15 — Fourier Analysis 

reasoning, but the other term has an exponential that vanishes as k — > —ioo instead, implying that you must push the 
contour down toward —ioo. 

— if jL e ik(x—a) _ 

Jc 3 27r k 

1 £® 

= +i — (— l)2ni Res — 

27T k= o 

The extra (—1) factor comes because the contour is clockwise. 

3. In the third domain, x < —a, both exponentials have the form e~ ik , requiring you to push the contour toward —ioo. 
The integrand now has both exponentials, so it is analytic at zero and there is zero residue. The integral vanishes and 
the whole analysis takes you back to the original function, Eq. (15.6). 

Another example of a Fourier transform, one that shows up often in quantum mechanics 

/ OO PO O 

dxe- ikx e ~ x2 / a2 = / dxe - ikx ~ x2 / a2 

-OO J — OO 

The trick to doing this integral is to complete the square inside the exponent. 

-ikx - x 2 /a 2 = —j [x 2 + o 2 ikx - cr 4 /c 2 /4 + a 4 fc 2 / 4] = —j [(x + ika 2 /2) 2 + a 4 /c 2 /4] 

The integral of / is now 

/ OO 

dx' e~ x ' 2 / ^ where x' = x + ika/ 2 

-OO 

The change of variables makes this a standard integral, Eq. (1.10), and the other factor, with the exponential of k 2 , 
comes outside the integral. The result is 

g(k) = aV 7 re-' T2fc2 / 4 (15.7) 

This has the curious result that the Fourier transform of a Gaussian is* a Gaussian. 


* Another function has this property: the hyperbolic secant. Look up the quantum mechanical harmonic oscillator 
solution for an infinite number of others. 



15 — Fourier Analysis 


496 


15.2 Convolution Theorem 

What is the Fourier transform of the product of two functions? It is a convolution of the individual transforms. What 
that means will come out of the computation. Take two functions /i and f 2 with Fourier transforms g \ and g 2 . 

J°° dxh{x)f 2 {x)e~ ikx = jdx j ^- gi (k')e ik ' x f 2 (x)e- ikx 

= J ^9i(k') I dxe ik ' x f 2 (x)e- ikx 
= I ^9i(k') J dx f 2 {x)e-* k ~ w > 

/ °° dk' 

J^9i(k')g 2 {k-k') (15.8) 

The last expression (except for the 27t) is called the convolution of g\ and g 2 . 

/°° 1 

dx fi{x)f 2 {x)e~ lkx = —(gi*g 2 ){k) (15.9) 


The last line shows a common notation for the convolution of g\ and g 2 . 
What is the integral of |/| 2 over the whole line? 


dxf*{x)f{x) = j dx f*{x) J 7 ^ 9 {k)e 
= f 7^9{k) l dx f*{x)e 


ikx 


ikx 




dx f(x)e 


—ikx 


f 


H \g(k)g-(k 0 


(15.10) 


This is Parseval’s identity for Fourier transforms. There is an extension to it in problem 15.10. 
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15.3 Time-Series Analysis 

Fourier analysis isn’t restricted to functions of x, sort of implying position. They’re probably more often used in analyzing 
functions of time. If you're presented with a complicated function of time, how do you analyze it? What information is 
present in it? If that function of time is a sound wave you may choose to analyze it with your ears, and if it is music, 
the frequency content is just what you will be listening for. That's Fourier analysis. The Fourier transform of the signal 
tells you its frequency content, and sometimes subtle periodicities will show up in the transformed function even though 
they aren’t apparent in the original signal. 

A function of time is /(£) and its Fourier transform is 

/ OC /‘OG J. . 

dtf(t) e iujt with /(£)=/ g(w) e~ iujt 

-oc j —OC 271 

The sign convention in these equations appear backwards from the one in Eq. (15.5), and it is. One convention is as 
good as the other, but in the physics literature you’ll find this pairing more common because of the importance of waves. 
A function ^(kx-ut) represents a wave with (phase) velocity u/k, and so moving to the right. You form a general wave 
by taking linear combinations of these waves, usually an integral. 

Example 

When you hear a musical note you will perceive it as having a particular frequency. It doesn't, and if the note has a very 
short duration it becomes hard to tell its* pitch. Only if its duration is long enough do you have a real chance to discern 
what note you're hearing. This is a reflection of the facts of Fourier transforms. 

If you hear what you think of as a single note, it will not last forever. It starts and it ends. Say it lasts from 
t = —T to t = +T, and in that interval it maintains the frequency uq. 

f(t) = Ae~ iuJot (— T <t <T) (15.11) 


The frequency analysis comes from the Fourier transform. 


9 M 


rT 


dte iuJt Ae~ iuJot = A 


l-T 


e i(ui-uJo)T _ e -i(uj-uj 0 )T 

i{u-u o) 


sm(u^-uJo )T_ 

(u-ouo) 


This is like the function of Eq. (15.6) except that its center is shifted. It has a peak at oj = ujq instead of at the origin 
as in that case. The width of the function g is determined by the time interval T. As T is large, g is narrow and high, 


* Think of a hemisemidemiquaver played at tempo prestissimo. 
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with a sharp localization near ojq . In the reverse case of a short pulse, the range of frequencies that constitute the note 
is spread over a wide range of frequencies, and you will find it difficult to tell by listening to it just what the main pitch 
is supposed to be. This figure shows the frequency spectrum for two notes having the same nominal pitch, but one of 
them lasts three times as long as the other before being cut off. It therefore has a narrower spread of frequencies. 



Example 

Though you can do these integrals numerically, and when you are dealing with real data you will have to, it's nice to 
have some analytic examples to play with. I’ve already shown, Eq. (15.7), how the Fourier transform of a Gaussian is 
simple, so start from there. 

If g(u) = e then /(f) = H 

2^/71 

If there are several frequencies, the result is a sum. 

g(u) = V4 e -^-un?/° 2 n ^ /(f) = y A n ^=e- iu}nt e~< t2 / 4 

n n 


In a more common circumstance you will have the time series, /(f), and will want to obtain the frequency decomposition, 
g(ui), though for this example I worked backwards. The function of time is real, but the transformed function g is complex. 
Because / is real, it follows that g satisfies g(—uj) = g*{u)- See problem 15.13. 




This example has four main peaks in the frequency spectrum. The real part of g is an even function and the 
imaginary part is odd. 
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This is another example with four main peaks. 

In either case, if you simply look at the function of time on the left it isn't obvious what sort of frequencies are 
present. That's why there are standard, well-developed computer programs to do the Fourier analysis. 

15.4 Derivatives 

There are a few simple, but important relations involving differentiation. What is the Fourier transform of the derivative 
of a function? Do some partial integration. 


Hf) 


=!***% 



-iuF(f) 


(15.12) 


Here I've introduced the occasionally useful notation that J-'(f) is the Fourier transform of /. The boundary terms in 
the partial integration will go to zero if you assume that the function / approaches zero at infinity. 

The n th time derivative simply give you more factors: (—iu) n on the transformed function. 

15.5 Green’s Functions 

This technique showed up in the chapter on ordinary differential equations, section 4.6, as a method to solve the forced 
harmonic oscillator. In that instance I said that you can look at a force as a succession of impulses, as if you’re looking 
at the atomic level and visualizing a force as many tiny collisions by atoms. Here I’ll get to the same sort of result as an 
application of transform methods. The basic technique is to Fourier transform everything in sight. 

The damped, forced harmonic oscillator differential equation is 


m ~dt^ + b~dt + k X = F°(t) (15.13) 

Multiply by e lujt and integrate over all time. You do the transforms of the derivatives by partial integration as in 
Eq. (15.12). 



[Eq. (15.13)] = —moj 2 x — ibux + kx 


Fq, where x(u>) 



dte lLOt x(t) 
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This is an algebraic equation that is easy to solve for the function x(u). 

~ t ^ F 0 (u) 

x(u) = o 1 FT 

—mu z — ibu + k 

Now use the inverse transform to recover the function x{t). 


x{t) 



du 

2i T 


e luJt x(u) 


d^ c -iut F 0 (u) 

2n — mu 2 — ibu + k 


du 




27 r — mu 2 — ibu + k 
du 


I dt'Foit' )e iut ' 

D —iwt 


dt'F 0 (t' 


Aujt 


27 r — mu 2 — ibu + k 


(15.14) 


In the last line I interchanged the order of integration, and in the preceding line I had to be sure to use another symbol 
t' in the second integral, not t. Now do the u integral. 


r°° du e~ iuJt iujt , _ r°° du 
J _ 00 2n — mu 2 — ibu + k J 2n — mu 2 — ibu + k 


(15.15) 


To do this, use contour integration. The singularities of the integrand are at the roots of the denominator, — mu 2 — 
ibu + k = 0. They are 


—ib ± \J— b 2 + 4 km 

u = = u± 

2m 


x 


X 


Ci 



Both of these poles are in the lower half complex plane. The contour integral Ci is along the real axis, and now I have 

to decide where to push the contour in order to use the residue theorem. This will be governed by the exponential, 

e -iuo(t-t') 
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First take the case t < t ' , then j s Q f the f orm e + iu} t so j n the complex w-plane its behavior in the =E i 

directions is as a decaying exponential toward +i (oc e - ^). It is a rising exponential toward —i (oc e + ^). This means 
that pushing the contour Ci up toward C 2 and beyond will make this integral go to zero. I've crossed no singularities, 
so that means that Eq. (15.15) is zero for t < t' . 

Next, the case that t > V . Now ) j s Q f the f orm e -iw i so j ts behavior is reversed from that of the 

preceding paragraph. It dies off rapidly toward — ioo and rises in the opposite direction. That means that I must push 
the contour in the opposite direction, down to C 3 and to C 4 . Because of the decaying exponential, the large arc of the 
contour that is pushed down to —ioo gives zero for its integral; the two lines that parallel the Taxis cancel each other; 
only the two residues remain. 



dw e t/s> 

2i r —mu 2 — ibu + k 


-2m ^ Res 


CJ± 




(15.16) 


The denominator in Eq. (15.15) is —m{u — u + )(u — U-). Use this form to compute the residues. Leave the 1/27T aside 
for the moment and you have 

—mu 2 — ibu + k —m(u — u + )(u — u-) 

The residues of this at u± are the coefficients of these first order poles. 


at u + : 


g -iu+{t-t') 

—m(u + — U-) 


andatu;_: 7 r 

—m(U- — u + ) 


The explicit values of u± are 


—ib + V —b 2 + 4 km 


2m 


and 


Let 


u = 


V— b 2 + 4 km 


U- = 

and 


—ib — \/ —b 2 + 4 km 
2m 
b 

7 “ 2 m 


u+ = 


2m 
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The difference that appears in the preceding equation is then 


uj + — uj- = (a/ — i'f) — (—a/ — i'y) = 2a/ 


Eq. (15.16) is then 


ro ° dco e -iu(t-t’) 


27t — mu 2 — ibu + k 


= —i 




—2muj' 


+ 


+2mw' 


2muj' 


g ;v- - , [ _ g -iu'(t-t r ) _|_ e +iu'(t-t') j 


mar 


e 7 ^ sin (a/(i - £')) 


Put this back into Eq. (15.14) and you have 


rt i 

x(t) = / dt' F 0 (t')G(t — t'), where G(t — t') = y-e -7 ^ - ^ sin (a /(t — t')) (15.17) 

J—oo TflUJ 


If you eliminate the damping term, setting 6 = 0, this is exactly the same as Eq. (4.34). The integral stops at t' = t 
because the Green’s function vanishes beyond there. The motion at time t is determined by the force that was applied 
in the past, not the future. 


Example 

Apply a constant external force to a damped harmonic oscillator, starting it at time t = 0 and keeping it on. What is 
the resulting motion? 


Fo(t) 


0 (t < 0) 

F i (t > 0) 
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where F\ is a constant. The equation (15.17) says that the solution is ( t > 0) 

x(t)= [ dt' F 0 {t')G(t-t') = [ dt’ F 1 G(t-t') 


= F 1 dt ‘ 

Jo 

F \ 


' 1 


mcu' 


sin — t ')) 


2im(jj' J o 
Fi 


dt'e-id-t') yiu'd-t') _ 


3 (- 7 +iu'){t-t') 


1 


2 imui' |_7 — ioj' 

F\ 


7 + iu' 


3 (- 7 -iu')(t-f) 


t'=t 


J t'= 0 


2imu' 

Fi 


2iu' 


3-7 1 


7 2 + to ' 2 7 2 + oj ' 2 


[ 2*7 sin l F t + 2 ico 1 cos co't] 


m{ 7 2 + a/ 2 ) 


1 — e F \ cos c Ft + — sin oj't\ 
1 u' J 


Check the answer. If t = 0 it is correct; a:(0) = 0 as it should. 

If t — > oo, x(t ) goes to F\/ (m( 7 2 + cu /2 )); is th/s correct? Check it out! And maybe simplify the result in the process. 
Is the small time behavior correct? 


15.6 Sine and Cosine Transforms 

Return to the first section of this chapter and look again at the derivation of the Fourier transform. It started with the 
Fourier series on the interval — L < x < L and used periodic boundary conditions to define which series to use. Then 
the limit as L — ^ oo led to the transform. 

What if you know the function only for positive values of its argument? If you want to write f{x) as a series when 
you know it only for 0 < x < L, it doesn't make much sense to start the way I did in section 15.1. Instead, pick the 
boundary condition at x = 0 carefully because this time the boundary won’t go away in the limit that L — > oo. The two 
common choices to define the basis are 

u(0)=0 = u(L), and u'(0) = 0 = u'(L) (15.18) 

Start with the first, then u n (x) = sm(mrx / L) for positive n. The equation (15.2) is unchanged, save for the limits. 

OO OO 

f (^) = ^ ^ U n (x), and ( u m , /) = (u m) ^ ^ a n Un) = 

1 n= 1 
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In this basis, (u m ,u m ) = L/ 2, so 


/<X> = jr ^ Un, ^\ u n (x) = jjr {Un, f)u n {x ) 

. \ Urn , ^ 77 , / , 

72=1 \ U') /<-/ 72=1 

Now explicitly use the sine functions to finish the manipulation, and as in the work leading up to Eq. (15.3), denote 
k n = 7 in/L, and the difference A k n = 7 t/L. 


9 

f ( x ) = ^ XI J 0 dx 'f( x 0 si 


ri7ra; . n7ra: 
sm — - — sm — - — 


l 

oo 


mrx . , 
sm — ^ A k r , 
7 r ^ L 


E ! 


dx'f(x') SYn.n'Kx’ / L 


(15.19) 


For a given value of k, define the integral 


g L {k)= / dx' sm(kx')f(x') 


If the function / vanishes sufficiently fast as x' — > oo, this integral will have a limit as L — > oo. Call that limit g(k). Look 
back at Eq. (15.19) and you see that for large L the last factor will be approximately g{k n ), where the approximation 
becomes exact as L — > oo. Rewrite that expression as 


2 ~ \ 

f(x) « - V sin(k n x)Ak n g(k r 
7 r 


(15.20) 


As L — ^ oo, you have A k n — ^ 0, and that turns Eq. (15.20) into an integral. 

2 


fix) = — dk sin kxgik), 

7T In 


where 


g(k) = / dx s'mkx f(x) 


(15.21) 


This is the Fourier Sine transform. For a parallel calculation leading to the Cosine transform, see problem 15.22, where 
you will find that the equations are the same except for changing sine to cosine. 


f(x) = — / dkcoskxgik), where 

7T In 


POO 

g(k) = / dx cos kx f(x) 

Jo 


(15.22) 
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What is the sine transform of a derivative? Integrate by parts, remembering that / has to approach zero at infinity 
for any of this to make sense. 


poo POO 

— k dx cos kx f(x) = —k dx cos kx fix) 
0 Jo Jo 


/ dx sin kx f'(x) = sin kxf(x) 

Jo 

For the second derivative, repeat the process. 

POO POO 

/ dx sinkxf"(x) = kf(0) — k 2 / dx sin kx f (x ) 

Jo Jo 


(15.23) 


15.7 Wiener-Khinchine Theorem 

If a function of time represents the pressure amplitude of a sound wave or the electric field of an electromagnetic wave 
the power received is proportional to the amplitude squared. By Parseval's identity, the absolute square of the Fourier 
transform has an integral proportional to the integral of this power. This leads to the interpretation of the transform 
squared as some sort of power density in frequency. \g(u)\ 2 du is then a power received in this frequency interval. When 
this energy interpretation isn’t appropriate, \g(oj )\ 2 is called the “spectral density." A useful result appears by looking 
at the Fourier transform of this function. 




du ,, \ 

2 T (w,e 


~ iujt J dt! f(t')e iujt> 
du 


dt'f (?) J 

dt' fit 1 

dt' -t) 




(15.24) 


When you're dealing with a real /, this last integral is called the autocorrelation function. It tells you in some average 
way how closely related a signal is to the same signal at some other time. If the signal that you are examining is just noise 
then what happens now will be unrelated to what happened a few milliseconds ago and this autocorrelation function will 
be close to zero. If there is structure in the signal then this function gives a lot of information about it. 

The left side of this whole equation involves two Fourier transforms ( / g, then \g\ 2 to it’s transform). The 
right side of this theorem seems to be easier and more direct to compute than the left, so why is this relation useful? 
It is because of the existence of the FFT, the “Fast Fourier Transform,” an algorithm that makes the process of Fourier 
transforming a set of data far more efficient than doing it by straight-forward numerical integration methods — faster 
by factors that reach into the thousands for large data sets. 
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Problems 

15.1 Invert the Fourier transform, g, in Eq. (15.7). 

15.2 What is the Fourier transform of e ik ° x ~ x2 / a2 7 Ans: A translation of the = 0 case 

15.3 What is the Fourier transform of xe~ x /<T “? 

15.4 What is the square of the Fourier transform operator? That is, what is the Fourier transform of the Fourier 
transform? 

15.5 A function is defined to be 

f(r\ = / 1 (-« < X < a) 

(0 (elsewhere) 

What is the convolution of / with itself? (/ * f)(x) And graph it of course. Start by graphing both f{x') and the other 
factor that goes into the convolution integral. 

Ans: (2 a — |x|) for (—2 a < x < +2 a), and zero elsewhere. 

15.6 Two functions are 

r , s fl (a < x < b) , f/\ f 1 (A<x<B) 

/l(x)= \0 (elsewhere) and ( 0 (elsewhere) 

What is the convolution of /i with / 2 ? And graph it. 

15.7 Derive these properties of the convolution: 

(a )f*9 = g*f (b ) f*(g*h) = (f*g)*h (c) 5(f * g) = f *5g + g*5f where 5f(t) = tf(t), 6g(t)=tg(t), 

etc. (d) What are 5 2 (f * g) and 5 3 (f * g)l 

15.8 Show that you can rewrite Eq. (15.9) as 


Hf*g) = Hf)-Hg) 


where the shorthand notation T(f) is the Fourier transform of /. 

15.9 Derive Eq. (15.10) from Eq. (15.9). 
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15.10 What is the analog of Eq. (15.10) for two different functions? That is, relate the scalar product of two functions, 


/ OO 

fl{x)f 2 {x) 

-OO 

to their Fourier transforms. Ans: f gl(k) g 2 (k) dk / 2n 

15.11 In the derivation of the harmonic oscillator Green’s function, and starting with Eq. (15.15), I assumed that the 
oscillator is underdamped: that b 2 < 4 km. Now assume the reverse, the overdamped case, and repeat the calculation. 

15.12 Repeat the preceding problem, but now do the critically damped case, for which b 2 = Akm. Compare your result 
to the result that you get by taking the limit of critical damping in the preceding problem and in Eq. (15.17). 

15.13 Show that if f(t ) is real then the Fourier transform satisfies g(—co) = g*(cu). 

What are the properties of g if / is respectively even or odd? 

15.14 Evaluate the Fourier transform of 


f(x) 


A(a — |x|) (—a < x < a) 

0 (otherwise) 


How do the properties of the transform vary as the parameter a varies? 

Ans: 2A(\ — cos ka) j k 2 

15.15 Evaluate the Fourier transform of Ae~ a ^ x i Invert the transform to verify that it takes you back to the original 
function. Ans: 2a/ [a 2 + k 2 ) 

15.16 Given that the Fourier transform of f(x ) is g(k), what is the Fourier transform of the function translated a 
distance a to the right, /i(x) = f(x — a)? 


15.17 Schroed inger's equation is 



h 2 d 2 i\) 
2m dx 2 


+ V(x) , 4> 


Fourier transform the whole equation with respect to x, and find the equation for <&(k,t), the Fourier transform of 
ip(x,t)- The result will not be a differential equation. Ans: —ihd&(k,t)/dt = (h 2 k 2 /2m)& + {v*&)/2n 
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15.18 Take the Green’s function solution to Eq. (15.13) as found in Eq. (15.17) and take the limit as both k and b go 
to zero. Verify that the resulting single integral satisfies the original second order differential equation. 

15.19 (a) In problem 15.18 you have the result that a double integral (undoing two derivatives) can be written as a 
single integral. Now solve the equation 


d 3 x 

dt 3 


F(t) 


r\. 


C 


directly, using the same method as for Eq. (15.13). You will get a pole at the origin and how do you handle this, where 
the contour of integration goes straight through the origin? Answer: Push the contour up as in the figure. Why? This 
is what's called the "retarded solution” for which the value of x(t) depends on only those values of F(t') in the past. If 
you try any other contour to define the integral you will not get this property. (And sometimes there's a reason to make 
another choice.) 

(b) Pick a fairly simple F and verify that this gives the right answer. 

Ans: 5/-00 dd F(t'){t-V) 2 

15.20 Repeat the preceding problem for the fourth derivative. Would you care to conjecture what 3 '/a integrals might 
be? Then perhaps an arbitrary non-integer order? 

Ans: ULdt'nm-i') 3 

15.21 What is the Fourier transform of xf(x)? Ans: ig'(k ) 

15.22 Repeat the calculations leading to Eq. (15.21), but for the boundary conditions m'(0) = 0 = u'{L), leading to 
the Fourier cosine transform. 

15.23 For both the sine and cosine transforms, the original function f(x) was defined for positive x only. Each of these 
transforms define an extension of / to negative x. This happens because you compute g(k ) and from it get an inverse 
transform. Nothing stops you from putting a negative value of x into the answer. What are the results? 

15.24 What are the sine and cosine transforms of e~ ax . In each case evaluate the inverse transform. 

15.25 What is the sine transform of f(x) = 1 for 0 < x < L and f{x) = 0 otherwise. Evaluate the inverse transform. 
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15.26 Repeat the preceding calculation for the cosine transform. Graph the two transforms and compare them, including 
their dependence on L. 

15.27 Choose any different way around the pole in problem 15.19, and compute the difference between the result with 
your new contour and the result with the old one. Note: Plan ahead before you start computing. 


Calculus of Variations 


The biggest step from derivatives with one variable to derivatives with many variables is from one to two. After 
that, going from two to three was just more algebra and more complicated pictures. Now the step will be from a finite 
number of variables to an infinite number. That will require a new set of tools, yet in many ways the techniques are not 
very different from those you know. 

If you’ve never read chapter 19 of volume II of the Feynman Lectures in Physics, now would be a good time. 
It's a classic introduction to the area. For a deeper look at the subject, pick up MacCluer’s book referred to in the 
Bibliography at the beginning of this book. 


16.1 Examples 

What line provides the shortest distance between two points? A straight line of course, no surprise there. But not so 
fast, with a few twists on the question the result won't be nearly as obvious. How do I measure the length of a curved 
(or even straight) line? Typically with a ruler. For the curved line I have to do successive approximations, breaking the 
curve into small pieces and adding the finite number of lengths, eventually taking a limit to express the answer as an 
integral. Even with a straight line I will do the same thing if my ruler isn't long enough. 

Put this in terms of how you do the measurement: Go to a local store and purchase a ruler. It's made out of 
some real material, say brass. The curve you’re measuring has been laid out on the ground, and you move along it, 
counting the number of times that you use the ruler to go from one point on the curve to another. If the ruler measures 
in decimeters and you lay it down 100 times along the curve, you have your first estimate for the length, 10.0 meters. 
Do it again, but use a centimeter length and you need 1008 such lengths: 10.08 meters. 

That’s tedious, but simple. Now do it again for another curve and compare their lengths. Here comes the twist: 
The ground is not at a uniform temperature. Perhaps you’re making these measurements over a not-fully-cooled lava 
flow in Hawaii. Brass will expand when you heat it, so if the curve whose length you’re measuring passes over a hot spot, 
then the ruler will expand when you place it down, and you will need to place it down fewer times to get to the end of 
the curve. You will measure the curve as shorter. Now it is not so clear which curve will have the shortest (measured) 
length. If you take the straight line and push it over so that it passes through a hotter region, then you may get a smaller 
result. 

Let the coefficient of expansion of the ruler be a, assumed constant. For modest temperature changes, the length 
of the ruler is £' = (1 + aAT)£. The length of a curve as measured with this ruler is 


d£ 


1 + aT 


(16.1) 
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Here I’m taking T = 0 as the base temperature for the ruler and dd is the length you would use if everything stayed 
at this temperature. With this measure for length, it becomes an interesting problem to discover which path has the 
shortest "length.” The formal term for the path of shortest length is geodesic. 

In section 13.1 you saw integrals that looked very much like this, though applied to a different problem. There I 
looked at the time it takes a particle to slide down a curve under gravity. That time is the integral of dt = dt/v, where 
v is the particle’s speed, a function of position along the path. Using conservation of energy, the expression for the time 
to slide down a curve was Eq. (13.6). 



(16.2) 


In that chapter I didn't attempt to answer the question about which curve provides the quickest route to the end, but 
in this chapter I will. Even qualitatively you can see a parallel between these two problems. You get a shorter length by 
pushing the curve into a region of higher temperature. You get a shorter time by pushing the curve lower, (larger y ). In 
the latter case, this means that you drop fast to pick up speed quickly. In both cases the denominator in the integral 
is larger. You can overdo it of course. Push the curve too far and the value of f dl itself can become too big. It’s a 
balance. 

In problems 2.35 and 2.39 you looked at the amount of time it takes light to travel from one point to another 
along various paths. Is the time a minimum, a maximum, or neither? In these special cases, you saw that this is related 
to the focus of the lens or of the mirror. This is a very general property of optical systems, and is an extension of some 
of the ideas in the first two examples above. 

These questions are sometimes pretty and elegant, but are they related to anything else? Yes. Newton’s classical 
mechanics can be reformulated in this language and it leads to powerful methods to set up the equations of motion in 
complicated problems. The same ideas lead to useful approximation techniques in electromagnetism, allowing you to 
obtain high-accuracy solutions to problems for which there is no solution by other means. 

16.2 Functional Derivatives 

It is time to get specific and to implement* these concepts. All the preceding examples can be expressed in the same 


* If you find the methods used in this section confusing, you may prefer to look at an alternate approach to the 
subject as described in section 16.6. Then return here. 
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general form. In a standard x-y rectangular coordinate system, 


dt = \J dx 2 + dy 2 


dx\ 1 + 



dx \Jl + y' 2 


Then Eq. (16.1) is 



\/i + y ' 2 

1 + aT(x, y) 


(16.3) 


This measured length depends on the path, and I’ve assumed that I can express the path with y as a function of x. 
No loops. You can allow more general paths by using another parametrization: x(t ) and y(t). Then the same integral 
becomes 



Vx 2 + y 2 
1 + oiT(x(t),y(t)) 


(16.4) 


The equation (16.2) has the same form 



v/i + y ' 2 
V ( 2 E/m) + 2 gy 


And the travel time for light through an optical system is 


/ 


dt 


dt 

v 



V 1 + y ' 2 

v{x,y) 


where the speed of light is some known function of the position. 

In all of these cases the output of the integral depends on the path taken. It is a functional of the path, a 
scalar-valued function of a function variable. Denote the argument by square brackets. 

fb 

I[y]= dxF(x,y{x),y' (x)) (16.5) 

J a 


The specific F varies from problem to problem, but the preceding examples all have this general form, even when 
expressed in the parametrized variables of Eq. (16.4). 
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The idea of differential calculus is that you can get information about a function if you try changing the independent 
variable by a small amount. Do the same thing here. Now however the independent variable is the whole path, so I’ll 
change that path by some small amount and see what happens to the value of the integral /. This approach to the 
subject is due to Lagrange. The development in section 16.6 comes from Euler. 




y + Sy 



A I = I[y + Sy]-I[y] 

nb+Ab rb (16.6) 

= / dxF(x,y(x) + 5y(x),y'(x) + Sy'(x)) — / dx F(x,y(x),y'(x)) 

Ja+Aa Ja 

The (small) function 5y(x) is the vertical displacement of the path in this coordinate system. To keep life simple 
for the first attack on this problem, I’ll take the special case for which the endpoints of the path are fixed. That is, 

y 


A a = 0, A b = 0, dy(a) = 0, 8y(b) = 0 


y + 5y 


5y 


To compute the value of Eq. (16.6) use the power series expansion of F, as in section 2.5. 

dF dF dF 

F[x + Ax, y + Ay, z + A z) = F(x, y, z) + + 

d 2 F (Ax) 2 d 2 F . . 

+ dx 2 2 + drihj AxAy + '" 


For now look at just the lowest order terms, linear in the changes, so ignore the second order terms. In this application, 
there is no Ax. 

F{x, y + Sy, y' + 5y') = F(x, y, y') + ^ 5y + ^ Sy ' 


plus terms of higher order in 5y and 5y' . 
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Put this into Eq. (16.6), and 


51 



dF OF / 
Bj Sy+ B V 5y . 


(16.7) 


For example, Let F = x 2 + y 2 + y' 2 on the interval 0 < x < 1. Take a base path to be a straight line from (0,0) to 
(1, 1). Choose for the change in the path 5y(x) = ex(l — x ). This is simple and it satisfies the boundary conditions. 



I[y] = dx [x 2 + y 2 + y' 2 \ = dx [x 2 + x 2 + l 2 ] = - 


I[y + 5y] = 


10 

x 2 + (x + ex(l - x)) 2 + (l + e(l - 2x)) A 

1 11 


^3 + 6 t+ 30 e ‘ 


(16.8) 


The value of Eq. (16.7) is 


51 = f dx [2 y5y + 2 y'5y'] = f dx [2xex(l - x) + 2e(l - 2x)\ = 

Jo Jo 6 


Return to the general case of Eq. (16.7) and you will see that I’ve explicitly used only part of the assumption that 
the endpoint of the path hasn’t moved, A a = A b = 0. There's nothing in the body of the integral itself that constrains 
the change in the y-direction, and I had to choose the function 5y by hand so that this constraint held. In order to use 
the equations 5y(a) = 5y{b) = 0 more generally, there is a standard trick: integrate by parts. You’ll always integrate by 
parts in these calculations. 



dFdSy 
dy' dx 



This expression allows you to use the information that the path hasn’t moved at its endpoints in the y direction either. 
The boundary term from this partial integration is 


dF . 
dy' V 


bMb))5y(b ) 


dF 

dy' 


(a,y{a))5y(a) = 0 
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Put the last two equations back into the expression for 51, Eq. (16.7) and the result is 


51 = dx 


'dF 

d_ 


_dy 

dx 

\dy' ) _ 


5y 


(16.9) 


Use this expression for the same example F = x 2 + y 2 + y' 2 with y(x) = x and you have 


51 = f dx 
Jo 


„ d „ , 

2y ~ ix 2v 


f 1 

5y = j dx [2x — 0] ex{\ — x) — -e 


This is sort of like Eq. (8.16), 


df = G ■ dr = grad / ■ dr = V/ ■ dr = dx = ^—dx i + ^—dx 2 + 


dxi 


dx\ 


dxo 


The differential change in the function depends linearly on the change dr in the coordinates. It is a sum over the terms 

with dx i, dx 2 This is a precise parallel to Eq. (16.9), except that the sum over discrete index k is now an integral 

over the continuous index x. The change in / is a linear functional of the change 5y in the independent variable y\ this 
5y corresponds to the change dr in the independent variable r in the other case. The coefficient of the change, instead 
of being called the gradient, is called the “functional derivative' 1 though it's essentially the same thing. 


51 dF d f dF\ rj . r f 51 , . . 

~5y = ~dy ~ Hx \<9t7 / ’ ^[y^y] = J dx ^(x,y(x),y (x)) 5y(x) (16.10) 

and for a change, I’ve indicated explicitly the dependence of 51 on the two functions y and 5y. This parallels the equation 
(8.13). The statement that this functional derivative vanishes is called the Euler-Lagrange equation. 

Return to the example F = x 2 + y 2 + y' 2 , then 

I = T v fo dxW + y2 + v ' 2] =2y -T x 2y, = 2y - 2y " 

What is the minimum value of /? Set this derivative to zero. 

y" — y = 0 y(x) = Acoshx + B smhx 
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The boundary conditions y( 0) = 0 and y( 1) = 1 imply y = B sinh x where B = 1/ sinh 1. The value of / at this point 
is 

r 1 i 

/[£> sinha;] = / dx \x 2 + B 2 sinh 2 x + B 2 cosh 2 x] =-+cothl (16.11) 

Jo 3 

Is it a minimum? Yes, but just as with the ordinary derivative, you have to look at the next order terms to determine 
that. Compare this value of I[y] = 1.64637 to the value 5/3 found for the nearby function y{x) = x, evaluated in 
Eq. (16.8). 

Return to one of the examples in the introduction. What is the shortest distance between two points, but for now 
assume that there’s no temperature variation. Write the length of a path for a function y between fixed endpoints, take 
the derivative, and set that equal to zero. 



dx \/l + y' 2 , 


5L _ d y' _ y" y’ 2 y" _ -y" 

5y~ dx^i + y ’2 ~ y/i + yn (i + y'2) 3 / 2 _ (i + y' 2 ) 3 / 2 


For a minimum length then, y" = 0, and that’s a straight line. Surprise! 

Do you really have to work through this mildly messy manipulation? Sometimes, but not here. Just notice that 
the derivative is in the form 

^ = -£/(✓) = <> f 16. 12) 

so it doesn't matter what the particular / is and you get a straight line. f(y') is a constant so y ’ must be constant too. 
Not so fast! See section 16.9 for another opinion. 


16.3 Brachistochrone 

Now for a tougher example, again from the introduction. In Eq. (16.2), which of all the paths between fixed initial 
and final points provides the path of least time for a particle sliding along it under gravity. Such a path is called a 
brachistochrone. This problem was first proposed by Bernoulli (one of them), and was solved by several people including 
Newton, though it’s unlikely that he used the methods developed here, as the general structure we now call the calculus 
of variations was decades in the future. 

Assume that the particle starts from rest so that E = 0, then 



\A + y ' 2 

VWy 


(16.13) 
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For the minimum time, compute the derivative and set it to zero. 


/o- ST y/T+F d y' 

V g 5y 2y 3 / 2 dx 2^/yyJl + y' 2 


= 0 


This is starting to look intimidating, leading to an impossibly* messy differential equation. Is there another way? Yes. 
Why must x be the independent variable? What about using yl In the general setup leading to Eq. (16.10) nothing 
changes except the symbols, and you have 


Hx] = fdyF(y, x ,x ') — g = ± (g) 


(16.14) 


Equation (16.13) becomes 


rv o 

T[x] = dy 
Jo 


yo , v/i + x 12 


Vzgy 


(16.15) 


The function x does not appear explicitly in this integral, just its derivative x' = dx/dy. This simplifies the functional 
derivative, and the minimum time now comes from the equation 


— -0- — 'l -n 

Sx dy \ dx' / 


(16.16) 


This is much easier. d()/dy = 0 means that the object in parentheses is a constant. 

1 


x 


d -L-n- 
dx' y/Zgy Vl + x' 2 


Solve this for x' and you get (let K = C y/2g) 


t _ dx _ / K 2 y 
X = djj^\ 1 — K 2 y ’ 


so x(y) = dy. 


1 K 2 y 
1 - K 2 y 


* Only improbably. See problem 16.12. 
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This is an elementary integral. Let 2 a = 1 / K 2 


x(y) = / dy 


y 


V 2 ay ~y 2 


= ! dy 


y 


\J a 2 — a 2 + 2 ay — y 2 


= I dy 


(y — a) + a 

\/a 2 - ( 1 / - a) 2 


Make the substitution (t/ — a ) 2 = 2 in the first half of the integral and (y — a) = a sin# in the second half. 


x(y) = o dz 


+ 


a 2 cos Odd 


\J a 2 — z J a 2 — a 2 sin 2 0 


= —\Ja 2 — z + a6 = —^/a 2 — {y — a) 2 + ~ 1 ' d 


a sm 


+ C' 


The boundary condition that x(0) = 0 determines C = aiv/2, and the other end of the curve determines a: x(yo) = Xq. 
You can rewrite this as 


x 


{y) = -V 2 ay-y 2 Tacos 1 


(16.17) 


This is a cycloid. What's a cycloid and why does this equation describe one? See problem 16.2. 

x-independent 

In Eqs. (16.15) and (16.16) there was a special case for which the dependent variable was missing from F. That made 
the equation much simpler. What if the independent variable is missing? Does that provide a comparable simplification? 
Yes, but it’s trickier to find. 


rr . f , 51 dF d ( dF\ n 

j-m = y ) — = _ _ / ) = 0 


(16.18) 


Use the chain rule to differentiate F with respect to x. 


dF _ dF dy dF dy' dF 
dx dx dx dy dx dy' 


(16.19) 


Multiply the Lagrange equation (16.18) by y' to get 


,dF ,ddF n 

V ~tyJ V FcW~° 
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Now substitute the term y'(dF/dy) from the preceding equation (16.19). 

dF dF dy' OF , d OF 
dx dx dx dy' ^ dx dy' 

The last two terms are the derivative of a product. 

dF dF d 
dx dx dx 

If the function F has no explicit x in it, the second term is zero, and the equation is now a derivative 


,dF_ 
J dy' 


= o 


d_ 

dx 


' ,dF 

v d v' . 


= o 


and 



c 


(16.20) 


(16.21) 


This is already down to a first order differential equation. The combination y'F y i — F that appears on the left of the 
second equation is important. It's called the Hamiltonian. 

16.4 Fermat’s Principle 

Fermat's principle of least time provides a formulation of geometrical optics. When you don’t know about the wave 
nature of light, or if you ignore that aspect of the subject, it seems that light travels in straight lines — at least until it 
hits something. Of course this isn't fully correct, because when light hits a piece of glass it refracts and its path bends 
and follows Snell's equation. All of these properties of light can be described by Fermat's statement that the path light 
follows will be that path that takes the least* time. 


T = 





(16.22) 


The total travel time is the integral of the distance dd over the speed (itself a function of position). The index of 
refraction is n = c/v, where c is the speed of light in vacuum, so I can rewrite the travel time in the above form using 
n. The integral f nd£ is called the optical path. 

* Not always least. This just requires the first derivative to be zero; the second derivative is addressed in section 
16.10. “Fermat's principle of stationary time” may be more accurate, but “Fermat's principle of least time” is entrenched 
in the literature. 
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From this idea it is very easy to derive the rule for reflection at a surface: angle of incidence equals angle of 
reflection. It is equally easy to derive Snell's law. (See problem 16.5.) I'll look at an example that would be difficult to 
do by any means other than Fermat’s principle: Do you remember what an asphalt road looks like on a very hot day? If 
you are driving a car on a black colored road you may see the road in the distance appear to be covered by what looks 
like water. It has a sort of mirror-like sheen that is always receding from you — the "hot road mirage”. You can never 
catch up to it. This happens because the road is very hot and it heats the air next to it, causing a strong temperature 
gradient near the surface. The density of the air decreases with rising temperature because the pressure is constant. 
That in turn means that the index of refraction will decrease with the rising temperature near the surface. The index 
will then be an increasing function of the distance above ground level, n = f(y), and the travel time of light will depend 
on the path taken. 


ndi = J f(y) d£ 


+ x ' 2 dy 


f(y)V 1 + y’ 2 dx 


(16.23) 


What is /(y)? I'll leave that for a moment and then after carrying the calculation through for a while I can pick an / 
that is both plausible and easy to manipulate. 



y 


<i- road -o- 

X 


Should x be the independent variable, or y? Either should work, and I chose y because it seemed likely to be 
easier. (See problem 16.6 however.) The integrand then does not contain the dependent variable x. 


minimize 


ndi = 


f f(vWl +x a dy 

f(y) 


d d 
dy dx' 


[f(y)Vi + x' 2 ] = o 




= c 


Solve for x’ to get 


f{y) 2 x ' 2 = C 2 (l + x' 2 ) 


dx _ C 

dy = Vfiv) 2 -^ 


(16.24) 


At this point pick a form for the index of refraction that will make the integral easy and will still plausibly represent 
reality. The index increases gradually above the road surface, and the simplest function works: f(y) = no(l + ay). The 
index increases linearly above the surface. 


x(y) = j 


c 


-dy = 


C 


dy- 


^/?tg(l + ay ) 2 — C 2 cmo J y/(y + l /a ) 2 — C 2 ja 2 n\ 
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This is an elementary integral. Let u = y + l/a, then u = (C/cmo) cosh#. 

x = f d6 => 6 = < -^(x — xo) =>• y = — — + cosh ((ano/C)(x — xo)) 
cm 0 J C a an 0 

C and Xo are arbitrary constants, and Xo is obviously the center of symmetry of the path. You can relate the other 
constant to the ^-coordinate at that same point: C = no(ayo + 1). 

Because the value of a is small for any real roadway, look at the series expansion of this hyperbolic function to 
the lowest order in a. 

y « y Q + a(x - x 0 ) 2 /2 (16.25) 

When you look down at the road you can be looking at an image of the sky. The light comes from the sky near the 
horizon down toward the road at an angle of only a degree or two. It then curves up so that it can enter your eye as you 
look along the road. The shimmering surface is a reflection of the distant sky or in this case an automobile — a mirage. 



16.5 Electric Fields 

The energy density in an electric field is €qE 2 /2. For the static case, this electric field is the gradient of a potential, 
E = — V0. Its total energy in a volume is then 


W = j J dV (V0) 2 (16.26) 

What is the minimum value of this energy? Zero of course, if (j) is a constant. That question is too loosely stated to be 
much use, but keep with it for a while and it will be possible to turn it into something more precise and more useful. 


* Donald Collins, Warren Wilson College 
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As with any other derivative taken to find a minimum, change the independent variable by a small amount. This time 
the variable is the function (j), so really this quantity W can more fully be written as a functional W[<f)\ to indicate its 
dependence on the potential function. 

W[d> + 54>) -wm= e fj dV + 54>)Y - | / dv (V<p) 2 
= '^ j dV ( 2 V<t>-V5<p + (V5<p) 2 ) 

Now pull out a vector identity from problem 9.36, 

V-(/$ = Vf-g + fV-g 

and apply it to the previous line with f = 5(f) and g = V</>. 


W[(f> + 5(f) ] - W[<f>\ = e 0 JdV [V -{5<t>V<f>) - <50V 2 </>] + ^ j dV (V<50) 2 

The divergence term is set up to use Gauss’s theorem; this is the vector version of integration by parts. 

W[(f) + 5(f)] - W[(f)\ = e 0 j) dI-(V(f))5(f) -e 0 JdV 50V 2 0 + ^ J dV (V^) 2 


(16.27) 


If the value of the potential (j) is specified everywhere on the boundary, then I'm not allowed to change it in the process 
of finding the change in W. That means that 5(f) vanishes on the boundary. That makes the boundary term, the dA 
integral, vanish. Its integrand is zero everywhere on the surface of integration. 

In looking for a minimum energy I want to set the first derivative to zero, and that's the coefficient of the term 
linear in 5(f). 


5W 


— £o V 2 (f) = 0 


The function that produces the minimum value of the total energy (with these fixed boundary conditions) is the one that 
satisfies Laplace’s equation. Is it really a minimum? Yes. In this instance it’s very easy to see that. The extra term in 
the change of W is f dV (V50) 2 . That is positive no matter what 5(f) is. 

That the correct potential function is the one having the minimum energy allows for an efficient approximate 
method to solve electrostatic problems. I’ll illustrate this using an example borrowed from the Feynman Lectures in 
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Physics and that you can also solve exactly: What is the capacitance of a length of coaxial cable? (Neglect the edge 
effects at the ends of the cable of course.) Let the inner and outer radii be a and b, and the length L. A charge density 
A is on the inner conductor (and therefore —A on the inside of the outer conductor). It creates a radial electric field of 
size A/27reof. The potential difference between the conductors is 


AV 



A 

27reof 


A In (b/a) 
2vre 0 


(16.28) 


The charge on the inner conductor is XL, so C = Q/ AV = 2neoL/ In (b/a), where AV = V b — V a . 

The total energy satisfies W = CAV 2 /2, so for the given potential difference, knowing the energy is the same 
as knowing the capacitance. 

This exact solution provides a laboratory to test the efficacy of a variational approximation for the same problem. 
The idea behind this method is that you assume a form for the solution p(r). This assumed form must satisfy the 
boundary conditions, but it need not satisfy Laplace’s equation. It should also have one or more free parameters, the 
more the better up to the limits of your patience. Now compute the total energy that this trial function implies and then 
use the free parameters to minimize it. This function with minimum energy is the best approximation to the correct 
potential among all those with your assumed trial function. 

Let the potential at r = a be V a and at r = b it is V b . An example function that satisfies these conditions is 



(16.29) 


The electric field implied by this is E = — V0 = f (V a — V 5 ) / (b — a), a constant radial component. From (16.26), the 
energy is 


eo 


dp 


eo 


— / L 27 rr dr { — ) = — / L 2nr dr 
dr 


V b -V a \ _ l_ r , b + a at ^ 2 


b — a 


= -nLeoj-- — AV 

2 b — a 


Set this to CAV 2 /2 to get C and you have 


n _ T b+a 

^approx — KJbCo b ^ 
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How does this compare to the exact answer, 27T6oL/ ln(b/a)7 Let x = b/a. 

C approx _lb + a ,/ lx + l x: 1.1 1.2 1.5 2.0 3.0 10.0 

C ~ 2 [n ( b / a > ~ 2 I ln X ratio: 1.0007 1.003 1.014 1.04 1.10 1.41 

Assuming a constant magnitude electric field in the region between the two cylinders is clearly not correct, but 
this estimate of the capacitance gives a remarkable good result even when the ratio of the radii is two or three. This is 
true even though I didn't even put in a parameter with which to minimize the energy. How much better will the result 
be if I do? 

Instead of a linear approximation for the potential, use a quadratic. 

4>(r) = V a + a(r - a) + /3(r - a) 2 , with ^(b) = V h 
Solve for a in terms of /3 and you have, after a little manipulation, 

<j>{r) = V a + + jd(r - a)(r - b ) (16.30) 

o a 


Compute the energy from this. 


W = y / L 2nr dr 


AV 
b — a 


l2 


+ /3(2 r - a — b) 


Rearrange this for easier manipulation by defining 2 (5 = r jAV/ (b — a ) and c = (a + b)/2 then 


W = -2Lvr 


= —2Ltt 


AV 
b — a 
AV 
b — a 


J ((r — c) + c) dr [l + 7 (r — c )] 2 

[c(b - a) + 7 (b - a ) 3 / 6 + C 7 2 (6 - a) 3 /l 2 ] 


7 is a free parameter in this calculation, replacing the original f3. To minimize this energy, set the derivative dW / d^j = 0, 
resulting in the value 7 = — l/c. At this value of 7 the energy is 


W = ^2LTt 


\b — a J |_ 2 V ' 6 (b + a)_ 


(16.31) 
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Except for the factor of AV 2 /2 this is the new estimate of the capacitance, and to see how good it is, again take the 
ratio of this estimate to the exact value and let x = b/ a. 


approx 

~c~ 


In re 


lx + 1 

2x- 1 


(x - l) 2 ’ 
3(x + 1) 2 _ 


(16.32) 


x: 1.1 1.2 1.5 2.0 3.0 10.0 

ratio: 1.00000046 1.000006 1.00015 1.0012 1.0071 1.093 

For only a one parameter adjustment, this provides very high accuracy. This sort of technique is the basis for many 
similar procedures in this and other contexts, especially in quantum mechanics. 


16.6 Discrete Version 

There is another way to find the functional derivative, one that more closely parallels the ordinary partial derivative. It is 
due to Euler, and he found it first, before Lagrange’s discovery of the treatment that I’ve spent all this time on. Euler’s 
method is perhaps more intuitive than Lagrange's, but it is not as easy to extend it to more than one dimension and it 
doesn't easily lend itself to the powerful manipulative tools that Lagrange's method does. This alternate method starts 
by noting that an integral is the limit of a sum. Go back to the sum and perform the derivative on it, finally taking a 
limit to get an integral. This turns the problem into a finite-dimensional one with ordinary partial derivatives. You have 
to decide which form of numerical integration to use, and I'll pick the trapezoidal rule, Eq. (11.15), with a constant 
interval. Other choices work too, but I think this entails less fuss. You don’t see this approach as often as Lagrange's 
because it is harder to manipulate the equations with Euler’s method, and the notation can become quite cumbersome. 
The trapezoidal rule for an integral is just the following picture, and all that you have to handle with any care are the 
endpoints of the integral. 


Uk 


Vk + 1 


x k = a + kA , 0 <k < N 

rb 


where 


dx f(x) = lim 


1 N ~' 1 
2 /(«)+ X /( x fc) + 9/W 

l 


b — a 

~w~ 

1 

2" 


= A 

A 


x k x k + 1 

The integral Eq. (16.5) involves y' , so in the same spirit, approximate this by the centered difference. 


y'k = y'( x k ) ~ {y( x k+i) - y( x k~ i))/ 2A 


This evaluates the derivative at each of the coordinates {o^} instead of between them. 
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The discrete form of (16.5) is now 

/discrete = ~^F (®j 2/(ct) , y (®)) 

IV-i A 

+ Y F (x k ,y(x k ),(y(x k+1 ) -y(x k _ l ))/2A)A+ -F(b,y(b),y'{b)) 

i 

Not quite yet. What about y'(a) and y'{b)l The endpoints y(a) and y{b) aren’t changing, but that doesn’t mean that 
the slope there is fixed. At these two points, I can’t use the centered difference scheme for the derivative, I'll have to 
use an asymmetric form to give 

/discrete = y-F (a, y(a ) , (y(xi) - y(x 0 ))f A) + ^F(b,y(b),(y(x N ) -y(x N _ t))/A) 

N - i (16.33) 

+ Y F{x k ,y(x k ),{y(x k+1 ) - y(x k _ i))/2A)A 


When you keep the endpoints fixed, this is a function of N — 1 variables, {y k = y{x k )} for 1 < k < N — 1, and to 
find the minimum or maximum you simply take the partial derivative with respect to each of them. It is not a function of 
any of the because those are defined and fixed by the partition x k = a + k A. The clumsy part is keeping track of 
the notation. When you differentiate with respect to a particular y%, most of the terms in the sum (16.33) don’t contain 
it. There are only three terms in the sum that contribute: i and t ± 1. In the figure N = 5, and the l = 2 coordinate 
(t/2) is being changed. For all the indices ^ except the two next to the endpoints (1 and N — 1), this is 


d_ _ _d_ 

r\ ^discrete — o 

oyi oy £ 


F{x e _ 1, y t _^ (y £ - y £ _ 2 )/ 2A) + 
F{x £ ,yi,(y £+1 -Jfc_i)/2A) + 
F{x t+ 1, y £+ 1, (yt+2 ~ yi)/ 2 A) A 



An alternate standard notation for partial derivatives will help to keep track of the manipulations: 


D\F is the derivative with respect to the first argument 
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The above derivative is then 


D 2 F(x £ , y e , (y l+1 - y e _i)/2 A) 


+7^[ D 3 F (x£-i,y£~i, (yi-ye- 2 )/ 2A ) - DaF(x i+1 ,ye +1 , (y £+2 -y t )/2A)] 


A 


(16.34) 


There is no D\F because the x £ is essentially an index. 

If you now take the limit A — > 0, the third argument in each function returns to the derivative y' evaluated at 
various x^s: 


D 2 F(x £ ,y £ ,y £ ) + 


1 

2A 


[D 3 F(x £ _ 1 ,y £ _ 1 ,y’ £ _ 1 ) 


D 3 F(x £+1 ,y £+1 ,y £+1 )] 


A 


D 2 F(x £ ,y(x £ ),y'(x £ )) 

+ j^\ D 3 F { x i-i,y{ x £-i),y\ x i~i)) ~ D 3 F (x £+1 ,y(x £+1 ),y’(x £+1 ))] 


A 


(16.35) 


Now take the limit that A — > 0, and the last part is precisely the definition of (minus) the derivative with respect to x. 
This then becomes 

1 0 7 

Ady~g Idix D 2 F(x £ ,y £l y' £ ) - -^D 3 F(x £ ,y £ ,y’ £ ) (16.36) 

Translate this into the notation that I've been using and you have Eq. (16.10). Why did I divide by A in the final step? 
That's the equivalent of looking for the coefficient of both the dx and the 5y in Eq. (16.10). It can be useful to retain 
the discrete approximation of Eq. (16.34) or (16.35) to the end of the calculation. This allows you to do numerical 
calculations in cases where the analytic equations are too hard to manipulate. 

Again, not quite yet. The two cases £ = 1 and £ = N — 1 have to be handled separately. You need to go back 
to Eq. (16.33) to see how they work out. The factors show up in different places, but the final answer is the same. See 
problem 16.15. 

It is curious that when formulating the problem this way, you don’t seem to need a partial integration. The result 
came out automatically. Would that be true with some integration method other other than the trapezoidal rule? See 
problem 16.16. 
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16.7 Classical Mechanics 

The calculus of variations provides a way to reformulate Newton’s equations of mechanics. The results produce efficient 
techniques to set up complex problems and they give insights into the symmetries of the systems. They also provide 
alternate directions in which to generalize the original equations. 

Start with one particle subject to a force, and the force has a potential energy function U . Following the traditions 
of the subject, denote the particle's kinetic energy by T. Picture this first in rectangular coordinates, where T = mv 2 /2 
and the potential energy is U{x\,X 2 ,Xs). The functional S depends on the path [xi(t),X 2 (t),Xs(t)] from the initial to 
the final point. The integrand is the Lagrangian, L = T — U . 


rt2 


S'jr ] = / L(f,r)dt, where 


L = T 


U=^{x\ + x l + x\) 


>t i 


U(x i,x 2 ,x 3 ) 


(16.37) 


The statement that the functional derivative is zero is 


5S_ _ dL_ _ d_ / 8L 
5x k dx k dt \dx k 


dU 

dxk 


d_ 

dt 


(’ mx k ) 


Set this to zero and you have 


mx k = — 


dU 

dxu 


or 


d 2 f - 

m w = F 


(16.38) 


That this integral of Ldt has a zero derivative is F = ma. Now what? This may be elegant, but does it accomplish 
anything? The first observation is that when you state the problem in terms of this integral it is independent of the 
coordinate system. If you specify a path in space, giving the velocity at each point along the path, the kinetic energy and 
the potential energy are well-defined at each point on the path and the integral S is too. You can now pick whatever 
bizarre coordinate system that you want in order to do the computation of the functional derivative. Can’t you do this 
with F = mal Yes, but computing an acceleration in an odd coordinate system is a lot more work than computing a 
velocity. A second advantage will be that it’s easier to handle constraints by this method. The technique of Lagrange 
multipliers from section 8.12 will apply here too. 

Do the simplest example: plane polar coordinates. The kinetic energy is 


T 



+ r 2 0 2 ) 
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The potential energy is a function U of r and 0. With the of the Lagrangian defined as T — U , the variational derivative 
determines the equations of motion to be 


S[rA\ 

dS_ 

dr 

dS_ 

S(j) 


rt2 


<t>(t)) dt — >• 

Jtl 

dL d dL ■ 2 dU 

-ft — - 7— = mr0 — mr = 0 

or dt or or 


dL d dL 


dU 


d 


a? ~ SaJ = “ a? “ = 0 


These are the components of F = ma in polar coordinates. If the potential energy is independent of 0, the second 
equation says that angular momentum is conserved: mr 2 0. 

What do the discrete approximate equations (16.34) or (16.35) look like in this context? Look at the case of 
one-dimensional motion to understand an example. The Lagrangian is 

T 7TI .2 TT ( x 

L = — x - U ( x ) 

Take the expression in Eq. (16.34) and set it to zero. 


or 


dU 

dx 


0 *e) + 


1 

2A 


[m(x £ - x e _ 2 )/2A - m(x e+2 - x e )/2A] = 0 


m 


x l+2 


- 2x e + x £ _ 2 

(2A)2 


< iU_ 

dx 


( x e) 


(16.39) 


This is the discrete approximation to the second derivative, as in Eq. (11.12). 

16.8 Endpoint Variation 

Following Eq. (16.6) I restricted the variation of the path so that the endpoints are kept fixed. What if you don't? As 
before, keep terms to the first order, so that for example At^dy is out. Because the most common application of this 
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method involves integrals with respect to time, I'll use that integration variable 


AS = 


rt b +At b 

' ia~\~Ata 
rt b +At b 

' ta~\~Ata 


Sb 


dtL(t,y(t) + 6y(t),y(t) + 6y(t)) - / dtL(t,y(t),y(t)) 


dt 


rt b +At b 

rta-\~Ata 

>h 

d ta 


T . , d L d L . 

rh 

dtL(t,y,y)+ / dt 


rh 


dt L(t, y(t),y(x)) 

a 

d L d L 

i. ” 

Drop quadratic terms in the second line: anything involving (5y) 2 or 5y5y or ( 5y ) 2 . Similarly, drop terms such as A t a 5y 
in going to the third line. Do a partial integration on the last term 


rtb j, d L d5y d L 


Jta d v dt d y 

The first two terms, with the A t a and A t^, are to first order 




(16.40) 


rt b +At b 



dt L(t , y, y) = L((t b , y(t b ),y{t b ))At b - L((t a , y{t a ),y(t a ))At a 


This produces an expression for AS 


AS =L((t b ,y(t b ),y(t b ))At b - L((t a ,y(ta),y(ta))At a 

rth r dL d 


r)T r)T 

+ g^(t b )Sy{tb) - ~qT (ta)Sy{ta) + 


dt 


dy dt 


dL 

dy 


5y 


(16.41) 


Up to this point the manipulation was straight-forward, though you had to keep all the algebra in good order. 
Now there are some rearrangements needed that are not all that easy to anticipate, adding and subtracting some terms. 

Start by looking at the two terms involving A t b and 5y(t b ) — the first and third terms. The change in the position 
of this endpoint is not simply 5y(t b ). Even if 5y is identically zero the endpoint will change in both the f-direction and 
in the ^/-direction because of the slope of the curve ( y ) and the change in the value of t at the endpoint (A t b ). 

The total movement of the endpoint at t b is horizontal by the amount A t b , and it is vertical by the amount 
(5y + yAt b ) . To incorporate this, add and subtract this second term, yAt, in order to produce this combination as a 
coefficient. 
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dL 

L{(t b ,y{t b ),y(t h ))At h + -^(t b )Sy(t b ) 

dl 

L (t b )At b - -^{t b )y(t b )At b 


' dL 
L -&& y . 


. , dL 
Atb + dy ; 


+ 

yAt b + 5y 


r)T r)T 

-g^(h)y(h) A h + Qjr(t b mt b ) 


(16.42) 


Do the same thing at t a , keeping the appropriate signs. Then denote 


dL u ■ T 

V = W H =py ~ L > 

H is the Hamiltonian and Eq. (16.41) becomes Noether's theorem. 

AS = 


Ay = 5y + yAt 


- 

rt b 

'dL 

d 

(dL\ 1 

pAy — H At 

+ / dt 

ta 

dy 

' dt 

(w). 


5y 


(16.43) 


If the equations of motion are satisfied, the argument of the last integral is zero. The change in S then comes only from 
the translation of the endpoint in either the time or space direction. If At is zero, and Ay is the same at the two ends, 
you are translating the curve in space — vertically in the graph. Then 


AS = pAy 


h 


= [p(h) - p(t a )]Ay = 0 
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If the physical phenomenon described by this equation is invariant under spacial translation, then momentum is conserved. 
If you do a translation in time instead of space and S is invariant, then At is the same at the start and finish, and 

AS =[-H(t b ) + H(t a )]At = 0 

This is conservation of energy. Write out what H is for the case of Eq. (16.37). 

If you write this theorem in three dimensions and require that the system is invariant under rotations, you get 
conservation of angular momentum. In more complicated system, especially in field theory, there are other symmetries, 
and they in turn lead to conservation laws. For example conservation of charge is associated with a symmetry called 
“gauge symmetry" in quantum mechanical systems. 

The equation (16.10), in which the variation 5y had the endpoints fixed, is much like a directional derivative in 
multivariable calculus. For a directional derivative you find how a function changes as the independent variable moves 
along some specified direction, and in the variational case the direction was specified to be with functions that were tied 
down at the endpoints. The development of the present section is in the spirit of finding the derivative in all possible 
directions, not just a special set. 

16.9 Kinks 

In all the preceding analysis of minimizing solutions to variational problems, I assumed that everything is differentiable 
and that all the derivatives are continuous. That's not always so, and it is quite possible for a solution to one of 
these problems to have a discontinuous derivative somewhere in the middle. These are more complicated to handle, 
but just because of some extra algebra. An integral such as Eq. (16.5) is perfectly well defined if the integrand has 
a few discontinuities, but the partial integrations leading to the Euler-Lagrange equations are not. You can apply the 
Euler-Lagrange equations only in the intervals between any kinks. 

If you're dealing with a problem of the standard form I[x] = ff dt L(t,x,x) and you want to determine whether 
there is a kink along the path, there are some internal boundary conditions that have to hold. Roughly speaking they 
are conservation of momentum and conservation of energy, Eq. (16.44), and you can show this using the results of the 
preceding section on endpoint variations. 



Assume there is a discontinuity in the derivative x at a point in the middle, t m . The equation to solve is still 5S/Sx = 0, 
and for variations of the path that leave the endpoints and the middle point alone you have to satisfy the standard 
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Euler-Lagrange differential equations on the two segments. Now however you also have to set the variation to zero for 
paths that leave the endpoints alone but move the middle point. 




Apply Eq. (16.43) to each of the two segments, and assume that the differential equations are already satisfied 
in the two halves. For the sort of variation described in the last two figures, look at the endpoint variations of the two 
segments. They produce 


5S 


pAx — H At 


- ta 


+ 


pAx — H At 


- tn 


[pAx - H At] (t m ) - [pAx - H At] (0 = 0 


These represent the contributions to the variation just above t m and just below it. This has to vanish for arbitrary At 
and Ax, so it says 

P(tm)=P{tm) and H{t^) = H(t+) (16.44) 

These equations, called the Weierstrass-Erdmann conditions, are two equations for the values of the derivative, x, on the 
two side of t m - The two equations for the two unknowns may tell you that there is no discontinuity in the derivative, 
or if there is then it will dictate the algebraic equations that the two values of x must satisfy. More dimensions means 
more equations of course. 

There is a class of problems in geometry coming under the general heading of Plateau’s 
Problem. What is the minimal surface that spans a given curve? Here the functional is f dA, 
giving the area as a function of the function describing the surface. If the curve is a circle 
in the plane, then the minimum surface is the spanning disk. What if you twist the circle so 
that it does not quite lie in a plane? Now it's a tough problem. What if you have two parallel 
circles? Is the appropriate surface a cylinder? (No.) This subject is nothing more than the 
mathematical question of trying to describe soap bubbles. They’re not all spheres. 

Do kinks happen often? They are rare in problems that usually come up in physics, and it seems to be of more 
concern to those who apply these techniques in engineering. For an example that you can verify for yourself however, 
construct a wire frame in the shape of a cube. You can bend wire or you can make it out of solder, which is much easier 
to manipulate. Attach a handle to one corner so that you can hold it. Now make a soap solution that you can use to 
blow bubbles. (A trick to make it work better is to add some glycerine.) Now dip the wire cube into the soap and see 
what sort of soap film will result, filling in the surfaces among the wires. It is not what you expect, and has several faces 
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that meet at surprising angles. There is a square in the middle. This surface has minimum area among surfaces that 
span the cube. 

Example 

In Eq. (16.12), looking at the shortest distance between two points in a plane, I jumped to a conclusion. To minimize 
the integral f(y')dx, use the Euler-Lagrange differential equation: 


d£_±d£ 

dy dx dy' 


fW = 0 


This seems to say that f(y') is a constant or that y" = 0, implying either way that y = Ax + B, a straight line. Now 
that you know that solutions can have kinks, you have to look more closely. Take the particular example 


f(y') = ay' 4 - f3y' 2 , with y( 0) = 0, and y(a) = b (16.45) 


One solution corresponds to y" = 0 and y(x) = bx/a. Can there be others? 

Apply the conditions of Eq. (16.44) at some point between 0 and a. Call it x m , and assume that the derivative 
is not continuous. Call the derivatives on the left and right ( y '~ ) and ( y ' + ). The first equation is 


P = 


dy' 


4 ay' 3 - 2 f3y' , 


and p(x m )=p(x+ h ) 


4 a(y' ) 3 — 2/5(2/' ) = 4a(t/+) 3 - 2/%'+) 

[(y'~) - ( y ,+ )] [(y ,+ ) 2 + iy ,+ )(y'~) + ( y '~ ) 2 - /V 2a ] = 0 


If the slope is not continuous, the second factor must vanish. 

(: y ,+ ? + ( y ,+ )(y '~ ) + ( y'~ ? - P/2 a = o 


This is one equation for the two unknown slopes. For the second equation, use the second condition, the one on H . 


zj /df f 

H = V W'- 1 ' 


and H(x m ) = H{x+ l ) 

H = y' [4 ay' 3 - 2f3y’] - [ay' 4 - f3y 12 } = 3ay' 4 - f3y' 2 


[(y'~) - ( y ’ + )] [( y ,+ ) 3 + (y ,+ ) 2 (y'~) + (y ,+ W~) 2 + (: y '~ ) 3 - P{{y' + ) + (y'~))/M = o 
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Again, if the slope is not continuous this is 

(: v' + f + (: y ,+ ) 2 (y'~ ) + ( y ,+ )(y'~ f + ( y '~ ) 3 - /W + ) + (2/ ,_ ))/3« = o 

These are two equations in the two unknown slopes. It looks messy, but look back at H itself first. It’s even. That 
means that its continuity will always hold if the slope simply changes sign. 

(»' + ) = -(»'-) 


Can this work in the other (momentum) equation? 

(y' + ) 2 + {y' + ){y'~) + (y'~) 2 - P/ 2a = 0 is now (y' + f = P/2a 

As long as a and have the same sign, this has the solution 

(; y' + ) = ±y/p/ 2«, W~) = T\/fi/2a (16.46) 

The boundary conditions on the original problem were y( 0) 

a/2 + b/ 27 , then 


j 'yx (0 < x < Xi) 

\b — 7 (x — a) (xi < x < b ) 


The paths labeled 0, 1, and 2 are three solutions that make the variational functional derivative vanish. Which is smallest? 
Does that answer depend on the choice of the parameters? See problem 16.19. 

Are there any other solutions? After all, once you’ve found three, you should wonder if 
it stops there. Yes, there are many — infinitely many in this example. They are characterized 
by the same slopes, ± 7 , but they switch back and forth several times before coming to the 
endpoint. The same internal boundary conditions ( p and H) apply at each corner, and there’s 
nothing in their solutions saying that there is only one such kink. 

Do you encounter such weird behavior often, with an infinite number of solutions? No, but you see from this 
example that it doesn't take a very complicated integrand to produce such a pathology. 
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16.10 Second Order 

Except for a couple of problems in optics in chapter two, 2.35 and 2.39, I've mostly ignored the question about minimum 
versus maximum. 

• Does it matter in classical mechanics whether the integral, j Ldt is minimum or not in determining the equations of 
motion? No. 

• In geometric optics, does it matter whether Fermat’s principle of least time for the path of the light ray is really 
minimum? Yes, in this case it does, because it provides information about the focus. 

• In the calculation of minimum energy electric potentials in a capacitor does it matter? No, but only because it’s always 
a minimum. 

• In problems in quantum mechanics similar to the electric potential problem, the fact that you're dealing sometimes 
with a minimum and sometimes not leads to some serious technical difficulties. 

How do you address this question? The same way you do in ordinary calculus: See what happens to the second 
order terms in your expansions. Take the same general form as before and keep terms through second order. Assume 
that the endpoints are fixed. 



A I = I[y + 5y]-I[y] 



(16.48) 


If the first two terms combine to zero, this says the first derivative is zero. Now for the next terms. 


Recall the similar question that arose in section 8.11. How can you tell if a function of two variables has a minimum, 
a maximum, or neither? The answer required looking at the matrix of all the second derivatives of the function — the 
Hessian. Now, instead of a 2 x 2 matrix as in Eq. (8.31) you have an integral. 
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In the two dimensional case V/ = 0 defines a minimum if the product ( df,Hdr ) is positive for all possible directions 
dr. For this new case the “directions” are the possible functions Sy and its derivative 5y' . 

The direction to look first is where Sy' is big. The reason is that I can have a very small Sy that has a very big 
Sy': 10 -3 sin (l0 6 ;r). If A / is to be positive in every direction, it has to be positive in this one. That requires F y ’ y i > 0. 



Is it really that simple? No. First the Sy terms can be important too, and second y can itself have several components. 
Look at the latter first. The final term in Eq. (16.48) should be 



d 2 F 

dy’mdy 


i Sy m Sy n 

n 


This set of partial derivatives of F is at each point along the curve a Hessian. At each point it has a set of eigenvalues and 
eigenvectors, and if all along the path all the eigenvalues are always positive, it meets the first, necessary conditions for 
the original functional to be a minimum. If you look at an example from mechanics, for which the independent variable 
is time, these y' n terms are then x n instead. Terms such as these typically represent kinetic energy and you expect that 
to be positive. 

An example: 

t-T r-T ^ 

S[f] = / dt L(x,y 1 x 1 y,t) = / dt -[x 2 + ij 2 + 2'ytxy - x 2 - y 2 ] 

Jo Jo 2 

This is the action for a particle moving in two dimensions (x, y) with the specified Lagrangian. The equation of motion 
are 

SS .. 

— = -x - x - 7 (ty + y) = 0 

SS .. 

j- = -y - y - 7 (tx + x) = o 


If 7 = 0 you have two independent harmonic oscillators. 
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The matrix of derivatives of L with respect to x = y\ and y = ij 2 is 

d 2 L _/ i 7 A 

dymdyn \ 1 ) 


The eigenvalues of this matrix are 1 ± 7 i, with corresponding eigenvectors 


and 


. This Hessian then says 


that S should be a minimum up to the time t = l/ 7 , but not after that. This is also a singular point of the differential 
equations for x and y. 


Focus 

When the Hessian made from the 8y' 2 terms has only positive eigenvalues everywhere, the preceding analysis might lead 
you to believe that the functional is always a minimum. Not so. That condition is necessary; it is not sufficient. It says 
that the functional is a minimum with respect to rapidly oscillating 8y. It does not say what happens if Sy changes 
gradually over the course of the integral. If this happens, and if the length of the interval of integration is long enough, 
the 5y' terms may be the small ones and the (8y) 2 may then dominate over the whole length of the integral. This is 
exactly what happens in the problems 2.35, 2.39, and 16.17. 

When this happens in an optical system, where the functional T = f dl/v is the travel time along the path, it 
signals something important. You have a focus. An ordinary light ray obeys Fermat’s principle that T is stationary with 
respect to small changes in the path. It is a minimum if the path is short enough. A focus occurs when light can go 
from one point to another by many different paths, and for still longer paths the path is neither minimum nor maximum. 



In the integral for T, where the starting point and the ending point are the source and image points, the second order 
variation will be zero for these long, gradual changes in the path. The straight-line path through the center of the lens 
takes least time if its starting point and ending point are closer than this source and image. The same path will be a 
saddle (neither maximum nor minimum) if the points are farther apart than this. This sort of observation led to the 
development of the mathematical subject called “Morse Theory,” a topic that has had applications in studying such 
diverse subjects as nuclear fission and the gravitational lensing of light from quasars. 


Thin Lens 

This provides a simple way to understand the basic equation for a thin lens. Let its thickness be t and its radius r. 
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Light that passes through this lens along the straight line through the center moves more slowly as it passes through the 
thickness of the lens, and takes a time 

I 77 

T 1 = -(p + q-t) + -t 
c c 

Light that take a longer path through the edge of the lens encounters no glass along the way, and it takes a time 


T-2 


- \/p 2 + r 2 + q 2 + r 2 


If p and q represent the positions of a source and the position of its image at a focus, these two times should be equal. 
At least they should be equal in the approximation that the lens is thin and when you keep terms only to the second 
order in the variation of the path. 


T 2 



1 + r 2 /p 2 + q 



p( 1 + r 2 /2p 2 ) + q(l + r 2 /2q 2 ) 


Equate T\ and T 2 . 


(p + q — t) + nt = p( 1 + r 2 /2p 2 ) + q(l + r 2 /2q 2 ) 


r 2 r 2 

(n — 1 )t = 1 

v J 2p 2q 

1 1 _ 2(n - 1 )t 

p q r 2 


1 

7 


(16.49) 


This is the standard equation describing the focusing properties of thin lenses as described in every elementary physics 
text that even mentions lenses. The focal length of the lens is then / = r 2 j2(n — 1 )t. That is not the expression you 
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usually see, but it is the same. See problem 16.21. Notice that this equation for the focus applies whether the lens is 
double convex or plano-convex or meniscus: ). If you allow the thickness t to be negative (equivalent to saying that 
there's an extra time delay at the edges instead of in the center), then this result still works for a diverging lens, though 
the analysis leading up to it requires more thought. 


Exercises 

1 For the functional F[x] = x(0) + Jq dt ( x(t ) 2 + x{t) 2 ) and the function x(t) = 1 + 1 2 , evaluate F[x], 

2 For the functional F[x] = f^dtx(t) 2 with the boundary conditions x(0) = 0 and x{l) = 1, what is the minimum 
value of F and what function x gives it? Start by drawing graphs of various x that satisfy these boundary conditions. 
Is there any reason to require that x be a continuous function of t? 

3 With the functional F of the preceding exercise, what is the functional derivative SF/Sx7 
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Problems 

16.1 You are near the edge of a lake and see someone in the water needing help. What path do you take to get there 
in the shortest time? You can run at a speed v\ on the shore and swim at a probably slower speed z >2 in the water. 
Assume that the edge of the water forms a straight line, and express your result in a way that’s easy to interpret, not as 
the solution to some quartic equation. Ans: Snell’s Law. 

16.2 The cycloid is the locus of a point that is on the edge of a circle that is itself rolling along a straight line — a 
pebble caught in the tread of a tire. Use the angle of rotation as a parameter and find the parametric equations for x(6) 
and y{6) describing this curve. Show that it is Eq. (16.17). 

16.3 In Eq. (16.17), describing the shortest-time slide of a particle, what is the behavior of the function for y <C a? In 
figuring out the series expansion of w — cos _1 (l —t), you may find it useful to take the cosine of both sides. Then you 
should be able to find that the two lowest order terms in this expansion are w = y/2t — t 3 / 2 / 12\/2. You will need both 
terms. Ans: x = V 2 y 3 / a /3 

16.4 The dimensions of an ordinary derivative such as dx/dt is the quotient of the dimensions of the numerator and 
the denominator (here L/T). Does the same statement apply to the functional derivative? 

16.5 Use Fermat’s principle to derive both Snell’s law and the law of reflection at a plane surface. 

Assume two straight line segments from the starting point to the ending point and minimize the total 
travel time of the light. The drawing applies to Snell’s law, and you can compute the travel time of 
the light as a function of the coordinate x at which the light hits the surface and enters the higher 
index medium. 

16.6 Analyze the path of light over a roadway starting from Eq. (16.23) but using x as the independent variable instead 
oft/. 

16.7 (a) Fill in the steps leading to Eq. (16.31). And do you understand the point of the rearrangements that I did just 
preceding it? Also, can you explain why the form of the function Eq. (16.30) should have been obvious without solving 
any extra boundary conditions? (b) When you can explain that in a few words, then what general cubic polynomial can 
you use to get a still better result? 

16.8 For the function F(x, y, y') = x 2 + y 2 + y' 2 , explicitly carry through the manipulations leading to Eq. (16.41). 
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16.9 Use the explicit variation in Eq. (16.8) and find the minimum of that function of e. Compare that minimum to 
the value found in Eq. (16.11). Ans: 1.64773 

16.10 Do either of the functions, Eqs. (16.29) or (16.30), satisfy Laplace's equation? 

16.11 For the function F(x, y, y') = x 2 + y 2 + y ' 2 , repeat the calculation of 51 only now keep all the higher order terms 
in 5y and 5y' . Show that the solution Eq. (16.11) is a minimum. 


16.12 Use the techniques leading to Eq. (16.21) in order to solve the brachistochrone problem Eq. (16.13) again. This 
time use x as the independent variable instead of y. 

16.13 On a right circular cylinder, find the path that represents the shortest distance between two points. d£ 2 = 

dz 2 + R 2 d(j) 2 . 


16.14 Put two circular loops of wire in a soap solution and draw them out, keeping their planes parallel. If 
they are fairly close you will have a soap film that goes from one ring to the other, and the minimum energy 
solution is the one with the smallest area. What is the shape of this surface? Use cylindrical coordinates 
to describe the surface. It is called a catenoid, and its equation involves a hyperbolic cosine. 

16.15 There is one part of the derivation going from Eq. (16.33) to (16.36) that I omitted: the special 
cases of t = 1 and £ = N — 1. Go back and finish that detail, showing that you get the same result even 
in this case. 



16.16 Section 16.6 used the trapezoidal rule for numerical integration and the two-point centered difference for differ- 
entiation. What happens to the derivation if (a) you use Simpson's rule for integration or if (b) you use an asymmetric 
differentiation formula such as y'{ 0) ~ [y(h) — y(0)\/h? 

16.17 For the simple harmonic oscillator, L = mx 2 /2— mco 2 x 2 /2. Use the time interval 0 < t < T so that S = Jq L dt, 
and find the equation of motion from 5S/8x = 0. When the independent variable x is changed to x + 5x, keep the 
second order terms in computing 5S this time and also make an explicit choice of 

5x(t) = esm(mrt/T ) 

For integer n = 1,2... this satisfies the boundary conditions that <5x(0) = 5x(T) = 0. Evaluate the change is S through 
second order in e (that is, do it exactly). Find the conditions on the interval T so that the solution to 5S/8x = 0 is 
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in fact a minimum. Then find the conditions when it isn’t, and what is special about the T for which S'fx] changes its 
structure? Note: This T is defined independently from c o. It’s specifies an arbitrary time interval for the integral. 

16.18 Eq. (16.37) describes a particle with a specified potential energy. For a charge in an electromagnetic field let 
U = qV X 2 , £3, t) where V is the electric potential. Now how do you include magnetic effects? Add another term 
to L of the form Cf-A(xi,X 2 ,Xs,t). Figure out what the Lagrange equations are, making dS/Sx^ = 0. What value 
must C have in order that this matches F = q(E + v x B ) = ma with B = V x A ? What is E in terms of V and 
A? Don’t forget the chain rule. Ans: C = q and then E = —VV — dA/dt 

16.19 (a) For the solutions that spring from Eq. (16.46), which of the three results shown have the largest and smallest 
values of f fdx ? Draw a graph of f(y') and see where the characteristic slope of Eq. (16.46) is with respect to the 
graph. 

(b) There are circumstances for which these kinked solutions, Eq. (16.47) do and do not occur; find them and explain 
them. 

16.20 What are the Euler-Lagrange equations for I[y ] = dx F(x,y,y' ,y")7 

16.21 The equation for the focal length of a thin lens, Eq. (16.49), is not the traditional one found in most texts. That 

is usually expressed in terms of the radii of curvature of the lens surfaces. Show that this is the same. Also note that 

Eq. (16.49) is independent of the optics sign conventions for curvature. 

16.22 The equation (16.25) is an approximate solution to the path for light above a hot road. Is there a function 

n = f(y) representing the index of refraction above the road surface such that this equation would be its exact solution? 

16.23 On the first page of this chapter, you see the temperature dependence of length measurements, (a) Take a metal 
disk of radius a and place it centered on a block of ice. Assume that the metal reaches an equilibrium temperature 
distribution T(r ) = Tq(t 2 / a 2 — 1). The temperature at the edge is T = 0, and the ruler is calibrated there. The disk 
itself remains flat. Measure the distance from the origin straight out to the radial coordinate r. Call this measured radius 
s. Measure the circumference of the circle at this radius and then express this circumference in terms of the measured 
radius s. 

(b) On a sphere of radius R (constant temperature) start at the pole (6 = 0) and write the distance along the arc at 
constant (j) down to the angle 6. Now go around the circle at this constant 6 and write its circumference. Express this 
circumference in terms of the distance you just wrote for the "radius" of this circle. 

(c) Show that the geometry you found in (a) is the same as that in (b) and find the radius of the sphere that this “heat 
metric” expresses. Ans: R = a/2y/aTo(l — aTo ) « a/2y/aTo 


16 — Calculus of Variations 


544 


16.24 Using the same techniques as in section 16.5, apply these methods to two concentric spheres. Again, use a linear 
and then a quadratic approximation. Before you do this, go back to Eq. (16.30) and see if you can arrive at that form 
directly, without going through all the manipulations of solving for a and j3. That is, determine how you could have 
gotten to (16.30) easily. Check some numbers against the exact answer. 

16.25 For the variational problem Eq. (16.45) one solution is y = hx/a. Assume that a, /3 > 0 and determine if 
this is a minimum or maximum or neither. Do this also for the other solution, Eq. (16.47). Ans: The first is min if 
h/a> yj f5/tSa. The kinked solution is always a minimum. 

16.26 If you can construct glass with a variable index of refraction, you can make a flat lens with 
an index that varies with distance from the axis. What function of distance must the index n(r ) 
be in order that this flat cylinder of glass of thickness t has a focal length /? All small angles and 
thin lenses of course. Ans: n(r ) = n(0) — r 2 /2ft. 



Densities and Distributions 


Back in section 12.1 I presented a careful and full definition of the word “function.” This is useful even though 
you should already have a pretty good idea of what the word means. If you haven’t read that section, now would be a 
good time. The reason to review it is that this definition doesn’t handle all the cases that naturally occur. This will lead 
to the idea of a "generalized function.” 

There are (at least) two approaches to this subject. One that relates it to the ideas of functionals as you saw them 
in the calculus of variations, and one that is more intuitive and is good enough for most purposes. The latter appears in 
section 17.5, and if you want to jump there first, I can’t stop you. 


17.1 Density 

What is density? If the answer is "mass per unit volume" then what does that mean? It clearly doesn’t mean what it 
says, because you aren't required* to use a cubic meter. 

It’s a derivative. Pick a volume AV and find the mass in that volume to be Am. The average volume-mass- 
density in that volume is Am/ AV. If the volume is the room that you're sitting in, the mass includes you and the air 
and everything else in the room. Just as in defining the concept of velocity (instantaneous velocity), you have to take a 
limit. Here the limit is 


Am dm 
AV-io AV dV 


lim 


(17.1) 


Even this isn’t quite right, because the volume could as easily shrink to zero by approaching a line, and that’s not what 
you want. It has to shrink to a point, but the standard notation doesn't let me say that without introducing more 
symbols than I want. 

Of course there are other densities. If you want to talk about paper or sheet metal you may find area-mass-density 
to be more useful, replacing the volume AV by an area AA. Maybe even linear mass density if you are describing wires, 
so that the denominator is AL And why is the numerator a mass? Maybe you are describing volume-charge-density or 
even population density (people per area). This last would appear in mathematical notation as dN / dA. 

This last example manifests a subtlety in all of these definitions. In the real world, you can't take the limit as 
AA — y 0. When you count the number of people in an area you can’t very well let the area shrink to zero. When 
you describe mass, remember that the world is made of atoms. If you let the volume shrink too much you’ll either be 
between or inside the atoms. Maybe you will hit a nucleus; maybe not. This sort of problem means that you have to 
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stop short of the mathematical limit and let the volume shrink to some size that still contains many atoms, but that is 
small enough so the quotient Am/ AV isn't significantly affected by further changing AV. Fortunately, this fine point 
seldom gets in the way, and if it does, you'll know it fast. I’ll ignore it. If you're bothered by it remember that you are 
accustomed to doing the same thing when you approximate a sum by an integral. The world is made of atoms, and any 
common computation about a physical system will really involve a sum over all the atoms in the system (e.g. find the 
center of mass). You never do this, preferring to do an integral instead even though this is an approximation to the sum 
over atoms. 

If you know the density — when the word is used unqualified it commonly means volume-mass-density — you find 
mass by integration over the volume you have specified. 

m = [ pdV (17.2) 

Jv 


You can even think of this as a new kind of function m(V)\ input a specification for a volume of space; output a mass. 
That's really what density provides, a prescription to go from a volume specification to the amount of mass within that 
volume. 

For the moment, I’ll restrict the subject to linear mass density, and so that you simply need the coordinate along 
a straight line, 


M x ) 


dm , . 


and 


m = 



A(x) dx 


(17.3) 


If A represents a function such as Ax 2 (0 < x < L), a bullwhip perhaps, then this is elementary, and m tota | = ALA / 3. 
I want to look at the reverse specification. Given an interval, I will specify the amount of mass in that interval and work 
backwards. The first example will be simple. The interval X\ < x < £2 is denoted [xi,X2]. The function m has this 
interval for its argument.* 


m([x i,x 2 ]) 


' 0 (xi < x 2 < 0) 

Ax 2 / 3 (xi < 0 < X 2 < L ) 

ALA / 3 (xi < 0 < L < X 2 ) 

A{x xf)/3 (0 < Xi < X 2 < L) 

7l(L 3 - xf)/3 (0 < xi < L < X 2 ) 
,0 {L <x 1 < x 2 ) 


(17.4) 


* I'm abusing the notation here. In (17.2) m is a number. In (17.4) m is a function. You're used to this, and 
physicists do it all the time despite reproving glances from mathematicians. 
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The density Ax 2 (0 < x < L) is of course a much easier way to describe the same distribution of mass. This distribution 
function, m([xi,x 2 ]), comes from integrating the density function X(x) = Ax 2 on the interval [xi,x 2 ]. 

Another example is a variation on the same theme. It is slightly more involved, but still not too bad. 


m([x i,x 2 ]) 


' 0 

Axl/2, 

Axl/S + m 0 
AL 3 j 3 + m 0 
A(x% - x\)/3 
A(x% - xf)/3 + m 0 
A[L 3 - xf)/3 + m 0 
A(x 3 - x 3 ) /3 
A(L 3 - xf)/3 
,0 


(xi < x 2 < 0) 

(Xi < 0 < x 2 < Lj 2) 

(xi < 0 < L/2 <x 2 <L) 
(xi < 0 < L < x 2 ) 

(0 < X\ < x 2 < L/2 ) 

(0 < Xi < L/2 < x 2 < L) 
(0 < xi < L/2 < L/2) 
(L/2 < Xi < X 2 < L) 
(L/2 < X\ < L < X2) 

{L <x 1 < x 2 ) 


(17.5) 


If you read through all these cases, you will see that the sole thing that I've added to the first example is a point mass 
mo at the point L/2. What density function A will produce this distribution? Answer: No function will do this. That's 
why the concept of a "generalized function” appeared. I could state this distribution function in words by saying 


"Take Eq. (17.4) and if [xi,x 2 ] contains the point L/2 then add mo.” 


That there's no density function A that will do this is inconvenient but not disastrous. When the very idea of a density 
was defined in Eq. (17.1), it started with the distribution function, the mass within the volume, and only arrived at the 
definition of a density by some manipulations. The density is a type of derivative and not all functions are differentiable. 
The function m([xi,x 2 ]) or m(V) is more fundamental (if less convenient) than is the density function. 

17.2 Functionals 


/ OO 

dxf{x)(j)(x ) 

-00 

defines a scalar-valued function of a function variable. Given any (reasonable) function </> as input, it returns a scalar. 
That is a functional. This one is a linear functional because it satisfies the equations 


F[a4>\ = clF[4>\ and F[(j) 1 + </> 2 ] = F[(j) 1 ] + F[4> 2 ] 


(17.6) 
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This isn’t a new idea, it's just a restatement of a familiar idea in another language. The mass density can define a useful 
functional ( linear density for now). Given dm/dx = X(x) what is the total mass? 


dx\(x)l = M tota | 


Where is the center of mass? 


Restated in the language of functionals, 


M i 


total J — c 


dx X(x)x = x c 


F[4>\= / dx X(x)(f>(x) then M tota | = F[l], x cm = T7 F[x] 

J—oo Titotal 


If, instead of mass density, you are describing the distribution of grades in a large class or the distribution of the 
speed of molecules in a gas, there are still other ways to use this sort of functional. If dN / dg = f{g) is the grade density 
in a class (number of students per grade interval), then with F[<p\ = f dg f (g)(/)(g) 


^students = F[l], 


mean grade = g= M F\g], 

■L ’'students 


(17.7) 


variance = a = 


1 


-^students 

kurtosis excess = 


F[(9-9) 2 }, 
1 


skewness = 


1 


-^students 


F[{g - 1 1Y } 


-^students^ ^ 


F[(g - gf\ - 3 


Unless you’ve studied some statistics, you will probably never have heard of skewness and kurtosis excess. They are ways 
to describe the shape of the density function, and for a Gaussian both these numbers are zero. If it’s skewed to one side 
the skewness is non-zero. [Did I really say that?] The kurtosis excess compares the flatness of the density function to 
that of a Gaussian. 

The Maxwell-Boltzmann function describing the speeds of molecules in an ideal gas is at temperature T 


/mb(v) 


m 

2nkT 


3/2 


47 TV 2 e~ mv2 / 2kT 


(17.8) 


dN/dv is the number of molecules per speed interval, but this function Fmb is normalized differently. It is instead 
( dN/dv)/N tota \ . It is the fraction of the molecules per speed interval. That saves carrying along a factor specifying 
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the total number of molecules in the gas. I could have done the same thing in the preceding example of student 
grades, defining the functional F\ = F/N stu d e nts . Then the equations (17.7) would have a simpler appearance, such as 
g = F\[g\. For the present case of molecular speeds, with F[(f)\ = dv , 

I RkF 1 

f[l] = l, Flv] = v= ] j — , F[mv 2 /2} = K.E. = -kT (17.9) 

Notice that the mean kinetic energy is not the kinetic energy that corresponds to the mean speed. 

Look back again to section 12.1 and you’ll see not only a definition of “function” but a definition of “functional." 
It looks different from what I’ve been using here, but look again and you will see that when you view it in the proper 
light, that of chapter six, they are the same. Equations (12.3)-(12.5) involved vectors, but remember that when you 
look at them as elements of a vector space, functions are vectors too. 

Functional Derivatives 

In section 16.2, equations (16.6) through (16.10), you saw a development of the functional derivative. What does that 
do in this case? 

F[<j)\ = J dxf{x)(j){x), so F[0 + <50] -F[4>] = j dxf{x)5(j){x) 

The functional derivative is the coefficient of 5(f) and dx, so it is 

SF 

H = f < 1710 > 

That means that the functional derivative of m(Jxi, £ 2 ]) in Eq. (17.4) is the linear mass density, A(x) = Ax 2 , (0 < 
x < L ). There are more interesting functional derivatives in chapter 16, e.g. Eq. (16.10). Is there such a thing as a 
functional integral? Yes, but not here, as it goes well beyond the scope of this chapter. Its development is central in 
quantum field theory. 


17.3 Generalization 

Given a function /, I can create a linear functional F using it as part of an integral. What sort of linear functional arises 
from /'? Integrate by parts to find out. Here I’m going to have to assume that / or 0 or both vanish at infinity, or the 
development here won’t work. 


m 


dxf(x)4>(x), 


then 


/ OO 

dx f (x)(j){x) = f(x)(j)(x) 

-OO 


dx f{x)(j)'{x) 


~FW\ 


(17.11) 
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In the same way, you can relate higher derivatives of / to the functional F. There's another restriction you need to 
make: For this functional — F[(f)'] to make sense, the function (j) has to be differentiable. If you want higher derivatives 
of /, then (j) needs to have still higher derivatives. 

If you know everything about F, what can you determine about /? If you assume 
that all the functions you’re dealing with are smooth, having as many derivatives as you 
need, then the answer is simple: everything. If I have a rule by which to get a number F[(j)\ 
for every (smooth) (j), then I can take a special case for (j) and use it to find /. Use a (j) 
that drops to zero very rapidly away from some given point; for example if n is large this 
function drops off rapidly away from the point Xq. 

M x ) = ^ e - n ( x ~ x °) 2 

I've arranged it so that the integral of (j) n over all x is one. If I want the value of / at Xq I can do an integral and 

take a limit. 

/ oo poo rzr 

dx f(x)<j) n (x) = J dx f(x)^-e ~ n ^ x ~ Xo)2 

As n increases to infinity, all that matters for / is its value at xq. The integral becomes 

lim f dxf(x 0 )\/ 1 ^e~ n( ' x ~ Xo ' > ~ = f(x 0 ) (17.12) 

This means that I can reconstruct the function / if I know everything about the functional F . To get the value of the 
derivative f(x o) instead, simply use the function —<fi' n and take the same limit. This construction is another way to 
look at the functional derivative. The equations (17.10) and (17.12) produce the same answer. 

You say that this doesn’t sound very practical? That it is an awfully difficult and roundabout way to do something? 
Not really. It goes back to the ideas of section 17.1. To define a density you have to know how much mass is contained 
in an arbitrarily specified volume. That’s a functional. It didn't look like a functional there, but it is. You just have to 
rephrase it to see that it's the same thing. 

As before, do the case of linear mass density. X(x) is the density, and the functional F[cj)\ = dx X(x)(j)(x). 
Then as in Eq. (17.4), m[[x 1 ,^ 2 ]) is that mass contained in the interval from X\ to X 2 and you can in turn write it as 
a functional. 

Let x be the function x(x) = j J (otherwise)^} then / dx X ( x ) = F lx] = m {[xi,x 2 }) 
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What happens if the function / is itself not differentiable? I can still define the original functional. Then I’ll see 
what implications I can draw from the functional itself. The first and most important example of this is for / a step 
function. 


f(x) = 9{x) 


0 (x<0) F[<t>]= 


(17.13) 


6 has a step at the origin, so of course it's not differentiable there, but if it were possible in some way to define its 
derivative, then when you look at its related functional, it should give the answer —F[(f>'] as in Eq. (17.11), just as for 
any other function. What then is — F[(j)’]l 


-FW] 



dx6{x)4>'{x) 




m 


(17.14) 


This defines a perfectly respectable linear functional. Input the function </> and output the value of the function at zero. 
It easily satisfies Eq. (17.6), but there is no function / that when integrated against (j) will yield this result. Still, it is 
so useful to be able to do these manipulations as if such a function exists that the notation of a “delta function” was 
invented. This is where the idea of a generalized function enters. 

Green’s functions 

In the discussion of the Green's function solution to a differential equation in section 4.6, I started with the differential 
equation 

mx + kx = F(t) 


and found a general solution in Eq. (4.34). This approach pictured the external force as a series of small impulses and 
added the results from each. 


x{t) 




(17.15) 
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I wrote it a little differently there, but it's the same. Can you verify that this is a solution to the stated differential 
equation? Simply plug in, do a couple of derivatives and see what happens. One derivative at a time: 

f = /”d t 'G(i-i W ) where G(t) = { ™ ^ °> (17,16) 

Now for the second derivative. Oops. I can’t differentiate G. It has a step at t = 0. 

This looks like something you saw in a few paragraphs back, where there was a step in the function 9. I’m going 
to handle this difficulty now by something of a kludge. In the next section you’ll see the notation that makes this 
manipulation easy and transparent. For now I will subtract and add the discontinuity from G by using the same step 
function 9. 

G(t) = l C0S Uot ~ 1 + X ] (*^°) 
l) \() (t<0) 

_ | (||J)} + i W -^ ) + i W (17.17) 


The (temporary) notation here is that Go is the part of G that doesn’t have the discontinuity at t = 0. That part is 
differentiable. The expression for dx/dt now has two terms, one from the Go and one from the 9. Do the first one: 


j t f dt' Go(t - t')F(t') 

and s e » (f) 


J°° dt'^G 0 (t-t')F(t') 

f ^ [-w 0 sin o; 0 t] (t > 0) 

\ 0 (t< 0) 


The original differential equation involved rnx + kx. The Go part of this is 



±[-u Q smuJo(t-tr)] {t>t')\ F(fl 
0 (t<t') J 1 



^sinwo (t-t') (t>t') 

0 {t < t') 



Use k = rncu^ , and this is zero. 
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Now go back to the extra term in 6. The kx terms doesn’t have it, so all that's needed is 

r] r°° 1 d r 1 

mx + kx = m-r- / dt' — 9{t - t')F(t') = — / dt' F(t') = F(t) 

dt J ^ m dt 7.^ 

This verifies yet again that this Green’s function solution works. 

17.4 Delta-function Notation 

Recognizing that it would be convenient to be able to differentiate non-differentiable functions, that it would make 
manipulations easier if you could talk about the density of a point mass (m / 0 =?), and that the impulsive force that 
appears in setting up Green’s functions for the harmonic oscillator isn't a function, what do you do? If you're Dirac, you 
invent a notation that works. 

Two functionals are equal to each other if they give the same result for all arguments. Does that apply to the 
functions that go into defining them? If 


dx fi(x)4>(x) = / dx f 2 {x) 4 >(x) 


for all test functions 0 (smooth, infinitely differentiable, going to zero at infinity, whatever constraints we find expedient), 
does it mean that f\ = / 2 ? Well, no. Suppose that I change the value of f\ at exactly one point, adding 1 there and 
calling the result These functions aren’t equal, but underneath an integral sign you can't tell the difference. In terms 
of the functionals they define, they are essentially equal: “equal in the sense of distributions.” 

Extend this to the generalized functions 

The equation (17.14) leads to specifying the functional 

m = m ( 17 . 18 ) 

This delta-functional isn't a help in doing manipulations, so define the notation 

/ OO 

dx5(x)(j>(x) = 5 [(f)] = 0(0) (17.19) 

-OO 

This notation isn't an integral in the sense of something like section 1.6, and 5(x) isn't a function, but the notation allows 
you effect manipulations just as if they were. Note: the symbol d here is not the same as the 6 in the functional derivative. 
We're just stuck with using the same symbol for two different things. Blame history and look at problem 17.10. 
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You can treat the step function as differentiable, with O' = 5, and this notation leads you smoothly to the right 
answer. 


Let 9 Xo (x) = jj j'j, =6{x-x 0 ), then j dx 0 Xo (x)<t>{x) = J dx<f>(x ) 

The derivative of this function is 

^9 Xo (x) = S(x - x 0 ) 

You show this by 

r°° ftft ( r \ r°° 

/ dx — ^ — <j)(x) = — dx9 Xo {x)(f)' (x) = — dx(f)'{x ) = 4>( x o) 

J— OO J— OO Jx 0 

The idea of a generalized function is that you can manipulate it as if it were an ordinary function provided that you put 
the end results of your manipulations under an integral. 

The manipulations for the harmonic oscillator in the previous section, translated to this language become 

mG + kG = 8(t ) for G(t)=^rrtro SinUJot 

Then the solution for a forcing function F(t) is 

/ OO 

G(t - t')F(t') dt' 

-OO 

because 

/ OO __ /*oo 

(mG + kG)F(t > ) dt' = / 8(t — t')F{tf)dif = F(t) 

-OO J — OO 

This is a lot simpler. Is it legal? Yes, though it took some serious mathematicians (Schwartz, Sobolev) some serious 
effort to develop the logical underpinnings for this subject. The result of their work is: It's o.k. 


17.5 Alternate Approach 

This delta-function method is so valuable that it's useful to examine it from more than one vantage. Here is a very 
different way to understand delta functions, one that avoids an explicit discussion of functionals. Picture a sequence of 
smooth functions that get narrower and taller as the parameter n gets bigger. Examples are 



n 1 

7T 1 + n 2 x 2 ’ 


1 sin nx 


n 

IT 


sech nx 


IT X 


(17.20) 
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Pick any one such sequence and call it 8 n (x). (A “delta sequence”) The factors in each case are arranged so that 

/ OO 

dx8 n (x ) = 1 

-OO 

As n grows, each function closes in around x = 0 and becomes very large there. Because these are perfectly smooth 
functions there's no question about integrating them. 


dx 8 n (x)(j)(x) 


(17.21) 


makes sense as long as 0 doesn’t cause trouble. You will typically have to assume that the 0 behave nicely at infinity, 
going to zero fast enough, and this is satisfied in the physics applications that we need. For large n any of these functions 
looks like a very narrow spike. If you multiply one of these 8 n s by a mass m, you have a linear mass density that is 
(for large n ) concentrated near to a point: X(x) = m8 n {x). Of course you can’t take the limit as n oo because this 
doesn't have a limit. If you could, then that would be the density for a point mass: m8{x). 

What happens to (17.21) as n — > oo? For large n any of these delta-sequences ap- 
proaches zero everywhere except at the origin. Near the origin 0(x) is very close to 0(0), and 
the function 8 n is non-zero in only the tiny region around zero. If the function 0 is simply 
continuous at the origin you have 


lim / dx 8 n (x)(p(x) = 0(0) ■ lim 

n— >oo / ^ n —> oo 


dx8 n (x) = 0(0) 


(17.22) 



At this point I can introduce a notation: 

/ OO /*oo 

dx 8{x)<b(x)" MEANS lim / dx 8 n (x)(p(x) 
-oo n ^°° J- oo 


(17.23) 


In this approach to distributions the collection of symbols on the left has for its definition the collection of symbols on 
the right. In turn, the definition of f goes back to the fundamentals of calculus, as in section 1.6. You cannot move 
the limit in this last integral under the integral sign. You can’t interchange these limits because the limit of 8 n is not a 
function. 

In this development you say that the delta function is a notation, not for a function, but for a process (but then, 
so is the integral sign). That means that the underlying idea always goes back to a familiar, standard manipulation of 
ordinary calculus. If you have an equation involving such a function, say 


6'{x) = 8(x), then this means 0' n {x) = 8 n {x) 


and that 
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/ oo roc 

dx9' n (x)4>(x) = lim / dx 8 n (x)(f)(x) = 0(0) (17.24) 

-oo n ^°° 4-00 

How can you tell if this equation is true? Remember now that in this interpretation these are sequences of ordinary, 
well-behaved functions, so you can do ordinary manipulations such as partial integration. The functions 6 n are smooth 
and they rise from zero to one as x goes from — oo to +oo. As n becomes large, the interval over which this rise occurs 
will become narrower and narrower. In the end these 6 n will approach the step function 6{x) of Eq. (17.13). 



dx 6' n {x)(j){x) 


O n (x)4>(x) 


OO 


— OO 



dx 6 n (x)4>'(x) 


The functions (j) go to zero at infinity — they’re “test functions” — and that kills the boundary terms, leaving the last 
integral standing by itself. Take the limit as n — > oo on it. You can take the limit inside the integral now, because the 
limit of 9 n is a perfectly good function, even if it is discontinuous. 


lim 

n 


dx9 n (x)4>\x) = 


f 


dx 9(x)cf)'{x) 



OO 


=m 

o 


This is precisely what the second integral in Eq. (17.24) is. This is the proof that 9' = 5. Any proof of an equation 
involving such generalized functions requires you to integrate the equation against a test function and to determine if the 
resulting integral becomes an identity as n — > oo. This implies that it now makes sense to differentiate a discontinuous 
function — as long as you mean differentiation “in the sense of distributions.” That’s the jargon you encounter here. 
An equation such as 9 1 = 8 makes sense only when it is under an integral sign and is interpreted in the way that you 
just saw. 

In these manipulations, where did I use the particular form of the delta sequence? Never. A particular combination 
such as 

12 n 1 

9 n {x) = H — ta,n —1 nx\ , and 8 n (x) = ^ (17.25) 

2 L 7T J 7T 1 + n z x z 


never appeared. Any of the other sequences would have done just as well, and all that I needed was the properties of the 
sequence, not its particular representation. You can even use a delta sequence that doesn’t look like any of the functions 
in Eq. (17.20). 


8 n {x) = 2 



(17.26) 


This turns out to have all the properties that you need, though again, you don't have to invoke its explicit form. 
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What is 8 (ax)? Integrate 8 n (ax ) with a test function. 


dy 


lim / dx 5 n (ax)<j)(x) = lim / — 8 n (y)<j)(y / a) 

71 J — oo n J—o o ® 

where y = ax. Actually, this isn’t quite right. If a > 0 it is fine, but if a is negative, then when x —> — oo you have 
y +oo. You have to change the limits to put it in the standard form. You can carry out that case for yourself and 
verify that the expression covering both cases is 

lim [ dx 8 n (ax)4>(x) = lim [ ^ 8 n (y)<j)(y / a) = 7 ^ 70 ( 0 ) 

n J- oo n J- oo N \a\ 

Translate this into the language of delta functions and it is 


5 (ax) = — t8(x) 
a 


(17.27) 


You can prove other relations in the same way. For example 

S(x 2 -a 2 ) = ^-[5(x-a) + 5(x + a)] or 8(f(x)) = ^ { f/ ^^ 8(x - x k ) (17.28) 


\f'(.Xk)\ 


In the latter equation, is a root of /, and you sum over all roots. Notice that it doesn't make any sense if you have 
a double root. Just try to see what 5(x 2 ) would mean. The last of these identities contains the others as special cases. 
Eq. (17.27) implies that 8 is even. 

17.6 Differential Equations 

Where do you use these delta functions? Start with differential equations. I'll pick one that has the smallest number of 
technical questions associated with it. I want to solve the equation 


<pf_ 

dx 2 


k 2 f = F(x) 


(17.29) 


subject to conditions that f(x) should approach zero for large magnitude x. I'll assume that the given function F has 
this property too. 
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But first: Let k and y be constants, then solve 


d 2 g 
dx 2 


k 2 g = S(x — y) 


(17.30) 


I want a solution that is well-behaved at infinity, staying finite there. This equality is “in the sense of distributions" 
recall, and I’ll derive it a couple of different ways, here and in the next section. 

First way: Treat the S as simply a spike at x — y. Everywhere else on the x-axis it is zero. Solve the equation for 
two cases then, x < y and x > y. In both cases the form is the same. 

g" — k 2 g = 0, so g(x) = Ae kx + Be~ kx 

For x < y, I want g{x) to go to zero far away, so that requires the coefficient of e~ kx to be zero. For x > 0, the reverse 
is true and only the e~ kx can be present. 



(x < y) 
(x > y ) 


Now I have to make g satisfy Eq. (17.30) at x = y. 

Compute dg/dx. But wait. This is impossible unless the function is at least continuous. If it isn’t then I'd be 
differentiating a step function and I don't want to do that (at least not yet). That is 


g(y~) = Ae ky = g{y+) = Be~ ky 


(17.31) 


This is one equation in the two unknowns A and B. Now differentiate. 

dg _ f Ake kx (x < y) 
dx \ -Bke~ kx (x > y) 

This is in turn differentiable everywhere except at x = y. There it has a step 

discontinuity in g' = g'(y + ) — g'{y~) = —Bke~ ky — Ake ky 


(17.32) 


This means that (17.32) is the sum of two things, one differentiable, and the other a step, a multiple of 9. 

= differentiable stuff + ( - Bke~ ky - Ake ky )6(x - y) 
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The differentiable stuff satisfies the differential equation g" — k 2 g = 0. For the rest, Compute d 2 g/dx 2 


( - Bke~ ky - Ake ky ) ^ 9(x -y) = (~ Bke~ ky - Ake ky )5{x - y) 


Put this together and remember the equation you're solving, Eq. (17.30). 

g" — k 2 g = 0 (from the differentiable stuff) 

+ ( - Bke~ ky - Ake ky )5{x - y) 

Now there are two equations for A and B, this one and Eq. (17.31). 


Ae ky = Be~ ky 
- Bke~ ky - Ake ky = 1 


solve these to get 


= S(x-y) 


A = - e~ ky /2k 
B = -e ky /2k 


Finally, back to g. 



I - e k{x-y)/ 2k ( x < y) 

\ _ e -Kx-y) 1 2k ( x > y) 


(17.33) 


When you get a fairly simple form of solution such as this, you have to see if you could have saved some work, perhaps 
replacing brute labor with insight? Of course. The original differential equation (17.30) is symmetric around the point 
y. It's plausible to look for a solution that behaves the same way, using (x — y) as the variable instead of x. Either that 
or you could do the special case y = 0 and then change variables at the end to move the delta function over to y. See 
problem 17.11. 

There is standard notation for this function; it is a Green’s function. 


G(x,y) 


_ e fc(* y) j2k (x < y) 
_ e -k(x-y) 1 2k [x>y) 


_ e -k\x-y\/ 2 k 


(17.34) 


Now to solve the original differential equation, Eq. (17.29). Substitute into this equation the function 

/ OO 

dy G(x, y)F(y) 

-OO 
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d 2 
dx 2 


-k 2 


/ OO 

dy 

-OO 


— OO 
OO 


d 2 G(x , t/) 


k 2 G(x,y ) 


dx 2 

dy5{x -y)F(y) = F(x) 


F(y) 


This is the whole point of delta functions. They make this sort of manipulation as easy as dealing with an ordinary 
function. Easier, once you're used to them. 

For example, if F(x) = Fq between — Xq and +Xq and zero elsewhere, then the solution for / is 


*0 


dy G(x,y)F(y) = - / dy F 0 e~ k \ x -v\/2k 


'-x 0 


Fo 

2k 


f x _° x dy e- k G-y) 


f- X(l dye k d v) + f*° dye k( ~ y xS > (-x 0 < x < x 0 ) 
f*°° dye~ k (y ~ x ) 


(x > x 0 ) 

(— Xq < X 
{x < -Xq) 


Fo 

k 2 


(17.35) 


-X 0 

e -kx gj^ J tXo (x > Xq) 

[1 — e~ kx ° cosh/tx] (— Xq < X < Xo) 
e kx sinh fc.x'o (x < —Xq) 

You can see that this resulting solution is an even function of x, necessarily so because the original differential equation 
is even in x, the function F(x) is even in x, and the boundary conditions are even in x. 

Other Differential Equations 

Can you apply this method to other equations? Yes, many. Try the simplest first order equation: 

d [. / i 

s = S{x) 


•v 

dx 


4- G(x) = 9(x) 

/ oo rX 

dx'G(x - x')g(x') = / dx' g(x') 

-oo J — OO 


which clearly satisfies df /dx = g. 

If you try d 2 G/dx 2 = 8(x) you explain the origin of problem 1.48. 

Take the same equation d 2 G/dx 2 = 5{x — x') but in the domain 0 < x < L and with the boundary conditions 
Gr'(0) = 0 = G(L). The result is 

n ( T \ - / x ( x ' ~ L )/ L (0 < a: < x') 

{ 1 \x'{x-L)/L ( x’<x<L ) 



(17.36) 
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Solve Eq. (17.30) another way. Fourier transform everything in sight. 
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d 2 g 
dx 2 


~k 2 g 


S(x-y) 


f 


dx 



e -iqx 


dx 5(x — y)e iqx 


(17.37) 


The right side is designed to be easy. For the left side, do some partial integrations. I’m looking for a solution that goes 
to zero at infinity, so the boundary terms will vanish. See Eq. (15.12). 



k 2 ]g(x)e~ iqx = e~ iqy 


(17.38) 


The left side involves only the Fourier transform of g. Call it g. 


[- q 2 - k 2 ]g{q) = e iqv , 


e -m 

50 


Now invert the transform. 


dq 


g ( x ) = J 7^g(Q)e iqx = 


Do this by contour integration, where the integrand has singularities at 


d q e iq(x-y) 
2n q 2 + k 2 
q = ±ik. 


1 r e n (x-y) 

27T J Ci d<1 k' 2 + q 2 


Ci 

— 


The poles are at =E ik, and the exponential dominates the behavior of the integrand at large |g|, so there are two cases: 
x > y and x < y. Pick the first of these, then the integrand vanishes rapidly as q — > +ioo. 
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Compute the residue. 


The coefficient of 1 / (q 


e iq(x-y) e iq(x-y) 

k 2 + q 2 ( q — ik)(q + ik ) 

ik ) is the residue, so 


e iq(x—y) 

( q - ik)(2ik) 


e -k(x-y ) 

9 ( x ) = 2k (X > y) ( 17 - 39 ) 

in agreement with Eq. (17.33). The x < y case is yours. 

17.8 More Dimensions 

How do you handle problems in three dimensions? 8(f) = 5(x)S(y)S(z). For example I can describe the charge density 
of a point charge, dq/dV as q5(f— To). The integral of this is 

I q5(r — r 0 ) d 3 r = q 

as long as the position fo is inside the volume of integration. For an example that uses this, look at the potential of a 
point charge, satisfying Poisson's equation. 

V 2 F = — p/e„ 

What is the potential for a specified charge density? Start by finding the Green's function. 

V 2 G = 8(f), or instead do: V 2 G — k 2 G = 8(f) (17.40) 

The reason for starting with the latter equation is that you run into some problems with the first form. It's too singular. 
I’ll solve the second one and then take the limit as k — > 0. The Fourier transform method is simpler, so use that. 

J d 3 r[V 2 G - k 2 G]e~^' r = 1 

When you integrate by parts (twice) along each of the three integration directions dx, dy, and dz, you pull down a 
factor of — q 2 = —q 2 — q 2 — q 2 just as in one dimension. 


G(q) 


-1 

q 2 + k 2 


q 2 G- k 2 G\e-^' ? = 1 


or 
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where G is, as before, the Fourier transform of G. 

Now invert the transform. Each dimension picks up a factor of 1/27T. 


Gif) 


d 3 q e iq ' r 
(27 t ) 3 q 2 + k 2 


This is a three dimensional integral, and the coordinate system to choose is spherical, q 2 doesn't depend on the direction 
of q, and the single place that this direction matters is in the dot product in the exponent. The coordinates are q, 9, 
</> for the vector q, and since I can pick my coordinate system any way I want, I will pick the coordinate axis along the 
direction of the vector r. Remember that in this integral r is just some constant. 


G = 


1 

(27 t ) 3 


gigr cos 9 

q 2 dq sin 9 d9 d(b —5 rrr 

q 2 + k 2 


The integral d(j) becomes a factor 27t. Let u = cos 9, and that integral becomes easy. All that is left is the dq integral. 


G 



q 2 dq 1 
q 2 + k 2 iqr 




More contour integrals: There are poles in the g-plane at q = ±ik. The q 2 factors are even. The q in the 
denominator would make the integrand odd except that the combination of exponentials in brackets are also odd. The 
integrand as a whole is even. I will then extend the limits to ±00 and divide by 2. 


G 


1 

87 r 2 ir 



9 dq \ iqr _ p—iqr 

q 2 + k 2[ 


There are two terms; start with the first one. r > 0, so e iqr — > 0 as q +zoo. The contour is along the real axis, so 
push it toward ioo and pick up the residue at +ik. 


1 


1 


-27 ri Res 


9 


-e iqr = - 


87 r 2 ir J 8n 2 ir q =ik (q — ik)(q + ik) 

For the second term, see problem 17.15. Combine it with the preceding result to get 

G = — —e~ kr 
47r r 


1 „ . %k _u r 

-—r—2m—-e k 
87 r 2 ir 2 ik 


(17.41) 


(17.42) 
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This is the solution to Eq. (17.40), and if I now let k go to zero, it is G = — l/47rr. 

Just as in one dimension, once you have the Green’s function, you can write down the solution to the original 
equation. 

V 2 V = -p/eo => V = ~- [ dV G(r,f')p(r') = [ d 3 r ' (17.43) 

e 0 J 47re 0 J \r — r'\ 

State this equation in English, and it says that the potential of a single point charge is q/Aneor and that the 
total potential is the sum over the contributions of all the charges. Of course this development also provides the Green's 
function for the more complicated equation (17.42). 

Applications to Potentials 

Let a charge density be qS(r). This satisfies f d 3 r p = q. The potential it generates is that of a point charge at the 
origin. This just reproduces the Green’s function. 


<t> = 



Q 

47T60 r 


(17.44) 


What if p(f) = —pdS(r)/dzl [Why —pi Patience.] At least you can say that the dimensions of the constant p 
are charge times length. Now use the Green’s function to compute the potential it generates. 


1 


47ren 


d 3 r' (—p) 


d5{r') 1 

dz' I r — f‘ 


p 


47re 0 J 

p d 


d 3 r' 5(f') 


d 


1 


dz' I r — r'\ 


1 


47ren dz' \r — r'\ 


r '=0 


This is awkward, so I’ll use a little trick. Make a change of variables in (17.45) u = r — r' , then 


d_ 

dz' 


d 

duz 


p d 1 
47t 6q dz' \r — f'\ 


r '= 0 


p d 1 
47T 6q du z u 


(17.45) 


(17.46) 


(See also problem 17.19.) Cutting through the notation, this last expression is just 


—pdl—pd 1 p z 

Ane 0 dzr 47re 0 dz yjx 1 + y 2 + z 2 47re 0 ^2 + y 2 + z 2 ^/ 2 

p z p cos 6 
47reo r 3 47reo r 2 


(17.47) 
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The expression — <9(1 /r)/dz is such a simple way to compute this, the potential of an electric dipole, that it is 
worth trying to understand why it works. And in the process, why is this an electric dipole? The charge density, the 
source of the potential is a derivative, and a derivative is (the limit of) a difference quotient. This density is just that 
of two charges, and they produce potentials just as in Eq. (17.44). There are two point charges, with delta-function 
densities, one at the origin the other at —zAz. 



(17.48) 


The picture of the potential that arises from this pair of charges is (a). A negative charge (—q = —pj A z) at —zAz and 
a corresponding positive charge +q at the origin. This picture explains why this charge density, p(r) = —pdS(r)/dz, 
represent an electric dipole. Specifically it represents a dipole whose vector representation points toward +z, from the 
negative charge to the positive one. It even explains why the result (17.47) has the sign that it does: The potential 0 
is positive along the positive 2 -axis and negative below. That’s because a point on the positive 2 -axis is closer to the 
positive charge than it is to the negative charge. 

Now the problem is to understand why the potential of this dipole ends up with a result as simple as the derivative 
(— p/4ireo)d(l/r) /dz: The potential at the point P in the figure (a) comes from the two charges at the two distances 
r\ and r 2 . Construct figure (b) by moving the line r 2 upward so that it starts at the origin. Imagine a new charge 
configuration consisting of only one charge, q = p/ A z at the origin. Now evaluate the potentials that this single charge 
produces at two different points Pi and P 2 that are a distance A 2 apart. Then subtract them. 


0(P 2 )-0(Pl) 


g 

47Te 0 


1 

T'2 


1 

T\ 


In the notation of the preceding equations this is 


0(P 2 )-0(Pi) 


g 

1 1 

V 

1 1 

4vre 0 

_\r + zAz\ r 

47reoAo 

_\r + zAz\ r 
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Except for a factor of (—1), this is Eq. (17.48), and it explains why the potential caused by an ideal electric dipole at 
the origin can be found by taking the potential of a point charge and differentiating it. 


Point char ® e “ 47re 0 r 


Q 



Here, the electric dipole strength is qa. 

Can you repeat this process? Yes. Instead of two opposite point charges near each other, you can place two 
opposite point dipoles near each other. 



(17.49) 


This is the potential from the charge density p = +Q d 2 8{r) / dz 2 , where Q = qa 2 . [What about d 2 /dxdyl] 


Exercises 


1 What is the analog of Eq. (17.4) for the linear mass density X(x) = C (a constant) for 0 < x < L and zero otherwise? 


2 Take the preceding mass density and add a point mass mo at x = L/ 2. What is the distribution m([x 1 ,^ 2 ]) now? 

3 Use the A from the first exercise and define the functional F[(j)\ = f^dx \(x)cf)(x). What is the total mass, 
.F[l] = Ml What is the mean position of the mass, F[x\/Ml 

4 As in the preceding exercise, what are the variance, the skewness, and the kurtosis excess? 

5 What is f^dxS(x — Xo)l 

6 Pick any two of Eq. (17.20) and show that they are valid delta sequences. 

7 What is i'^oc dtS(t)l 

8 In Eq. (17.12) the function (j) n {x) appeared. Sketch a graph of it and of —cj)' n (x), which is needed in Eq. (17.11). 
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Problems 

17.1 Calculate the mean, the variance, the skewness, and the kurtosis excess for a Gaussian: f(g ) = Ae^ B( ' 9 ~ 9o ' )2 
(— oo < g < oo). Assume that this function is normalized the same way that Eq. (17.8) is, so that its integral is one. 

17.2 Calculate the mean, variance, skewness, and the kurtosis excess for a flat distribution, f(g) = constant, (0 <9< 
3max)- Ans: Var = g 2 m /\2 kurt. exc. = -6/5 

17.3 Derive the results stated in Eq. (17.9). Compare mv 2 /2 to K.E. Compare this to the results of problem 2.48. 

17.4 Show that you can rewrite Eq. (17.16) as an integral f^_ dt' costUo(t — t')F(t') and differentiate this directly, 
showing yet again that (17.15) satisfies the differential equation. 

17.5 What are the units of a delta function? 

17.6 Show that 

= 5(x-x 0 )/\f(x 0 )\ 

where Xq is the root of /. Assume just one root for now, and the extension to many roots will turn this into a sum as 
in Eq. (17.28). 

17.7 Show that 

(a) x5'(x) = —S(x) (b) x5(x) = 0 

(c) S'(—x) = —S'(x) (d) f(x)5(x — a) = f(a)S(x — a) 

17.8 Verify that the functions in Eq. (17.20) satisfy the requirements for a delta sequence. Are they normalized to have 
an integral of one? Sketch each. Sketch Eq. (17.26). It is complex, so sketch both parts. How can a delta sequence be 
complex? Verify that the imaginary part of this function doesn't contribute. 

17.9 What is the analog of Eq. (17.25) if 5 n is a sequence of Gaussians: y/n/n e ~ nx2 ? 

Ans: 0 n (x) = \ [l + erf (xy/n)] 

17.10 Interpret the functional derivative of the functional in Eq. (17.18): 5 5[<j)]/ S(j). Despite appearances, this actually 
makes sense. Ans: 5(x) 
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17.11 Repeat the derivation of Eq. (17.33) but with less labor, selecting the form of the function g to simplify the work. 
In the discussion following this equation, reread the comments on this subject. 

17.12 Verify the derivation of Eq. (17.35). Also examine this solution for the cases that Xo is very large and that it is 
very small. 

17.13 Fill in the steps in section 17.7 leading to the Green's function for g" — k 2 g = 5. 

17.14 Derive the analog of Eq. (17.39) for the case x < y. 

17.15 Calculate the contribution of the second exponential factor leading to Eq. (17.41). 

17.16 Starting with the formulation in Eq. (17.23), what is the result of S' and of 5" on a test function? Draw sketches 
of a typical S n , 8' n , and <5". 

17.17 If p{r ) = qa 2 d 2 8(r )/dz 2 , compute the potential and sketch the charge density. You should express your answer 
in spherical coordinates as well as rectangular, perhaps commenting on the nature of the results and relating it to 
functions you have encountered before. You can do this calculation in either rectangular or spherical coordinates. 

Ans: (2ga 2 /47reo)i : 2(cos6 l )/r 3 

17.18 What is a picture of the charge density p(r) = qa 2 d 2 8(r) / dxdyl 
(Planar quadrupole) What is the potential for this case? 

17.19 In Eq. (17.46) I was not at all explicit about which variables are kept constant in each partial derivative. Sort 
this out for both d/dz' and for d/du z . 

17.20 Use the results of the problem 17.16, showing graphs of S n and its derivatives. Look again at the statements 
leading up to Eq. (17.31), that g is continuous, and ask what would happen if it is not. Think of the right hand side of 
Eq. (17.30) as a S n too in this case, and draw a graph of the left side of the same equation if g n is assumed to change 
very fast, approaching a discontinuous function as n — > oo. Demonstrate by looking at the graphs of the left and right 
side of the equation that this can't be a solution and so that g must be continuous as claimed. 

17.21 Calculate the mean, variance, skewness, and the kurtosis excess for the density f(g) = A[5(g) + 8(g — go) + 
5(g — xgo)\. See how these results vary with the parameter x. 

Ans: skewness = 2 _3 / 2 (l + x)(x - 2) (2a; — 1) / (l — x + x 2 ) 
kurt. excess = — 3 + f (l + x 4 + (1 — a:) 4 ) /(l — x + x 2 ) 2 
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17.22 Calculate the potential of a linear quadrupole as in Eq. (17.49). Also, what is the potential of the planar array 
mentioned there? You should be able to express the first of these in terms of familiar objects. 


17.23 (If this seems out of place, it’s used in the next problems.) The unit square, 0 < x < 1 and 0 < y < 1, has area 
J dxdy = 1 over the limits of x and y. Now change the variables to 

u=\(x + y) and v = x — y 


and evaluate the integral, f dudv over the square, showing that you get the same answer. You have only to work out 
all the limits. Draw a picture. This is a special example of how to change multiple variables of integration. The single 
variable integral generalizes 


from 


f (x) dx = J f (x) ~j~du to J f(x,y)dxdy = J 


f{x ' v) Tj§XV) iuil ’ 


where 

d(x, y) _ / dx/du dx/dv\ 

d{u, v) ~ 6 \dy/du dy/dv) 

For the given change from x, y to u, v show that this Jacobian determinant is one. A discussion of the Jacobian appears 
in many advanced calculus texts. 


17.24 Problem 17.1 asked for the mean and variance for a Gaussian, f{g) = Ae ~ B( ' 9 ~ So '> 2 . Interpreting this as a 
distribution of grades in a class, what is the resulting distribution of the average of any two students? That is, given 
this function for all students, what is the resulting distribution of {(j\ + < 72 )/ 2? What is the mean of this and what is the 
root-mean-square deviation from the mean? How do these compare to the original distribution? To do this, note that 
f(g)dg is the fraction of students in the interval g to g + dg, so f{gi)f{g 2 )dgi dg 2 is the fraction for both. Now make 
the change of variables 

X=^{gi+g 2 ) and y = gi -g 2 

then the fraction of these coordinates between x and x + dx and y and y + dy is 

f{gi)f{g 2 )dxdy = f(x + y/2)f(x-y/2)dxdy 

Note where the result of the preceding problem is used here. For fixed x, integrate over all y in order to give you the 
fraction between x and x + dx. That is the distribution function for (g\ +g 2 )/2. [Complete the square.] Ans: Another 
Gaussian with the same mean and with rms deviation from the mean decreased by a factor \/2. 
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17.25 Same problem as the preceding one, but the initial function is 


fig) 


a/n 

a 2 + g 2 


(— oo < g < oo) 


In this case however, you don’t have to evaluate the mean and the rms deviation. Show why not. 

Ans: The result reproduces the original f(g) exactly, with no change in the spread. These two problems illustrate 
examples of “stable distributions,” for which the distribution of the average of two variables has the same form as the 
original distribution, changing at most the widths. There are an infinite number of other stable distributions, but there 
are precisely three that have simple and explicit forms. These examples show two of them. The Residue Theorem helps 
here. 


17.26 Same problem as the preceding two, but the initial function is 


( a ) fi9) = l/Smax for 0<g < 5-max 


(b) fig) = \ [% -9i) + Sig- g 2 )] 


17.27 In the same way as defined in Eq. (17.10), what is the functional derivative of Eq. (17.5)? 

17.28 Rederive Eq. (17.27) by choosing an explicit delta sequence, 5 n (x). 

17.29 Verify the result in Eq. (17.36). 
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447, 515 

covariant components 419 
cylindrical coordinates 255 
spherical coordinates 255 
Gram-Schmidt, 175, 180, 185 
graphs, 19-22 
grating, 83 
gravitational: 
energy, 315 

field 254, 274, 298, 312 
lensing 538 

potential 246, 275, 300, 337 
Green's function, 102-104, 125, 127, 499, 551, 562 

• 

half-life, 125 
Hamiltonian, 519, 531 
harmonic, 143 
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harmonic oscillator, 89-90, 96, 325, 343 
Green's function 551 
critical 123, 222 
damped 89, 126 
energy 92 
forced 92, 93, 146 
Green's function 102, 125 
Greens’ function 499 
resonance 147 
harmonic resonance, 149 
Hawaii, 510 
heads, 9 

heat equation, 320 

heat flow, 320-336 

heat flow vector, 322 

heat metric, 543 

heat wave, 326 

heat, minimum, 273, 276 

heated disk, 243 

Helmholtz Decomposition, 447 

Helmholtz-Hodge, 450 

hemisemidemiquaver, 497 

Hermite, 181, 230, 386 

hermitian, 225, 447 

Hero, 26 

Hessian, 258, 536 
Hilbert Space, 178 
history, blame, 553 
horoscope, 374 
hot road mirage, 520 
Hulk, 66 
Huygens, 47 

hyperbolic functions, 2-4, 74, 124, 324, 330, 495 
inverse 3 


Iceland spar, 403 

ideal gas, 548 

idempotent, 225 

impact parameter, 266 

impedance, 108, 129 

impulse, 102, 104 

independence, 168, 375 

index notation, 304, 317, 398, 410 

index of refraction, 63, 269, 277, 433 

indicial equation, 98 

inductor, 107 

inertia: 

moment of, 15, 196, 230, 251 
tensor 191, 195-200, 225, 391, 396, 402, 427 
infallible, 50 
infinite series, 131 
infinite-dimensional, 177 
infinitesimal, 240, 242, 415 
inflation, 59 

instability, 357, 372, 381 
integral, 11-18 
contour 463 
fractional 508 
numerical 361-367 
principal value 385, 386 
Riemann 11 
Stieltjes 15 
surface 283, 288, 293 
test 35, 333 

intensity, 47-48, 50, 150, 270 
interest rate, 59 
interpolation, 353-388 
inverse transform, 494 
iterative method, 101 


Jacobi, 232 



Jacobian, 423, 569 


kinetic energy, 60, 273, 549 
density 248 
kinks, 532 
Klein bottle, 439 
kludge, 552 

Kronecker delta, 174, 203, 305 
kurtosis, 548, 567 

• 

L-R circuit, 106 
Lagrange, 513, 525 

Lagrange multiplier, 259-263, 276, 378, 380, 528 

Lagrangian, 528 

Laguerre, 97, 386 

A, 208 

Lanczos, 366 

Laplace, 164, 335, 337, 448, 522, 542 
Laplacian, 301, 343, 351 
Laurent series, 464, 468, 486, 487 
Lax-Friedrichs, 383, 388 
Lax-Wendroff, 383, 388 
least square, 175, 256, 374, 376, 387 
least upper bound, 172 
Lebesgue, 15 

Legendre, 62, 97, 116-118, 131, 181, 228, 275, 313, 
352, 366, 388 
Legendre function, 120 
length of curve, 430, 512 
lens, 539 
limit cycle, 357 
line integral, 434, 442, 463 
linear: 

charge density, 344 
programming 278 
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charge density 57 
difference equation 372, 388 
equations 26, 110 
function 392 
functional 394, 395 
independence 168 
transformation 192 
linearity, 189, 191, 427 
Lobatto integration, 388 
logarithm, 3, 32, 78, 106, 479 
longitude, latitude, 250 
Lorentz force, 446 
• 

m(V), 546 

magnetic field, 254, 273, 313, 413, 425 
tensor 231, 402 
magnetic flux, 446 
magnetic monopole, 455 
manifold, 414, 415, 423 
mass density, 251, 298, 312 
matrix, 173, 193-224, 375, 399 
correlation 378 
positive definite 259 
as operator 211 

column 170, 174, 194, 211, 229, 401 
diagonal 213 
diagonalize? 221 
identity 202 

multiplication 194, 201, 205 
scalar product 173 
Maxwell, 403 

Maxwell’s equations, 165, 443, 447, 455 

Maxwell-Boltzmann, 548 

mechanics, 528 

messy and complicated, 94 

metric tensor, 396, 412, 421, 424 
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microwave, 337 
midpoint integration, 13, 363 
mirage, 521 
Morse Theory, 538 
mortgage, 59 
musical instrument, 143 
Mobius strip, 439 
• 

n th mean, 66 
NASA, 246 
natural boundary, 490 
New York, 174 
Newton, 516, 528 

gravity 298-303, 313, 315 
method 356 
nilpotent, 225, 232 
NIST, 9 

Noether’s theorem, 531 
noise, 379, 387 

non-differentiable function, 367 
norm, 171-177, 182, 257 
normal modes, 114, 172 
nuclear fission, 538 

• 

Oops, 552 

operator, 188, 192-224 
components 193, 219 
differential 140, 200 
exponential 63, 275 
inverse 203 

rotation 188, 195, 203 
translation 231 
vector 297 

optical path, 433, 519 
optics, 519 


orange peel, 241 
orbital, 186 
order, 465 
orientation, 207 
orthogonal, 225, 447 
orthogonal coordinates, 250, 293, 415 
orthogonality, 133, 140, 257, 325, 345 
orthogonalization, 175 
orthonormal, 174 
oscillation, 90, 113, 115, 340 
coupled 112, 126, 170, 216 
damped 91 
forced 95 

temperature 325, 348 

• 

panjandrum, 545 
paraboloid, 259 
parallel axis theorem, 198 
parallelepiped, 227, 282, 428 
parallelogram, 205, 208, 227 
parallelogram identity, 182 
parameters, 51 

parametric differentiation, 4, 7, 25 
Parseval's identity, 145, 150, 159, 496 
di, 306, 316 

partial integration, 18, 29, 139, 448, 514, 530, 549 

dS, dV, 436 

partition function, 263 

Pascal's triangle, 62 

Pauli, 229 

PDE, numerical, 381 
Peano, 162 
pendulum, 170 
periodic, 72, 338 

periodic boundary condition, 341, 492, 503 
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perpendicular axis theorem, 229 

Perron-Frobenius, 224 

pinking shears, 478 

pitfall, 357 

Plateau, 533 

Poisson, 300, 562 

polar coordinates, 18, 84, 239, 274, 464, 528 
polarizability, 192 
polarizability tensor, 393, 404 
polarization, 192, 270, 404 
pole, 465, 470, 483 
order 465, 466 
polynomial, 162, 483 
characteristic 231 
Chebyshev 186 
Hermite 181, 230, 386 
Laguerre 386 

Legendre 62, 117, 118, 128, 181, 228, 275, 316, 
318, 352, 366, 569 
population density, 545 
Postal Service, 172, 278 
potential, 62, 300, 303, 316, 337-342, 522, 564 
potential energy density, 273 
power, 320, 332 
power mean, 66 
power series, 462, 513 
power spectrum, 151 
Poynting vector, 407 
pre-Snell law, 277 
pressure tensor, 396 
prestissimo, 497 
principal components, 379, 389 
principal value, 385 
probability, 9 
product formula, 462 


product rule, 11 
Pythagorean norm, 174 

• 

quadratic equation, 65, 84 
quadrupole, 62, 254, 274, 275, 318 
quasi-static, 96 

• 

radian, 1 
radioactivity, 125 
rainbow, 267-270, 277 
random variable, 379 
range, 192, 393 
ratio test, 35 
rational number, 177 
reality, 333, 353 

reciprocal basis, 409, 415, 418, 428 

reciprocal vector, 422 

rectifier, 157 

recursion, 8, 25 

regular point, 96 

regular singular point, 96 

relation, 392 

relative error, 354 

relativity, 60, 423 

residue, 468, 501 

residue theorem, 468, 469 

resistor, 107, 273, 276 

resonance, 147 

Reynolds transport theorem, 445, 455 
Riemann Integral, 11, 463 
Riemann Surface, 476-479, 488 
Riemann-Stieltjes integral, 15, 230 
rigid body, 189, 195, 214, 391 
roots of unity, 76 
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rotation, 188, 192, 195, 203, 204, 287, 293 
components 195 
composition 202 
roundoff error, 373, 386 
rpncalculator, 7 
ruler, 510 

Runge-Kutta, 367, 389 
Runge-Kutta-Fehlberg, 369 

• 

saddle, 259 

saddle point, 256, 257, 276 

scalar product, 136, 148, 171-177, 181, 182, 226, 306, 
329, 338, 394, 409, 419 
scattering, 265-270 
scattering angle, 266, 277 
Schwartz, 554 
Schwarzenegger, 247 
secant method, 358 
self-adjoint, 225 
semiperimeter, 26 
separated solution, 328 
separation of variables, 105, 322-342, 343 
series, 31-46 
of series 38 

absolute convergence 37 
common 32 
comparison test 35, 37 
convergence 34, 59 
differential equation 96, 127 
double 342, 367 
examples 31 
exponential 230 
faster convergence 62 
Frobenius 97-99, 464 
geometric 230 
hyperbolic sine 330 


integral test 35 
Laurent 464 
power 32, 289, 353 
ratio test 35 
rearrange 38 
secant 38, 60 
telescoping 62, 157 
two variables 39, 61 
setting sun, 457 
sheet, 477, 479 
da/dn, 266-270, 277 
similarity transformation, 219 
simple closed curve, 468 
simply-connected, 442 
Simpson’s rule, 363, 542 
simultaneous equations, 110, 113 
sine integral, 153, 159 
sine transform, 503 
singular perturbations, 129 
singular point, 96 
singularity, 465, 468, 481 
sinh, 2, 74, 124, 140, 323, 516 
sketching, 19 
skewness, 548, 567 
Snell, 519 
Snell’s law, 541 
snowplow, 125 
soap bubbles, 533 
Sobolev, 185, 554 
solenoid, 254, 318 
solid angle, 263, 266 
space, 414 

specific heat, 236, 320 
spectral density, 505 
speed of light, 512 
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Spiderman, 66 
y/2, 111 

square-integrable, 164, 167 
stable distribution, 570 
stainless steel, 347 
Stallone, 247 

steady-state, 109, 110, 129 
step function, 551, 554 
sterradian, 264 

Stirling’s formula, 41, 61, 262 

stock market, 131 

Stokes' Theorem, 297, 439, 443 

straight line, 516 

strain, 287 

stress, 396 

stress-strain, 393 

string, 182 

Sturm-Liouville, 139, 350 

subspace, 447 

sum of cosines, 85 

Sumerians, 1 

summation by parts, 18 

summation convention, 221, 230, 304, 410 

sun, 315 

superposition, 334 

surface integral, 282, 284, 440, 443, 445 
closed 288 
examples 283 
symmetric, 225, 401, 447 

• 

tails, 9 
tanh, 2 

Taylor series, 32, 464, 467, 486 
telescoping series, 62, 157 


temperature, 263 
gradient 520 
distribution 327, 331 
expansion 510 
of slab 324 
oscillation 325 
profile 325 

tensor, 188, 287, 392 
component 398 
contravariant components 411 
field 415 

inertia 195-200, 214, 228, 391, 396, 402, 427 
metric 396, 421, 424 
rank 395, 396, 397, 402, 403, 411 
stress 396, 415 
totally antisymmetric 403 
transpose 401 
terminal speed, 52 
test function, 553, 556 
tetrahedron, 459 
Texas A&M, 163 
thermal expansion, 510 
6{x), 551 
thin lens, 538 
time average, 150 
time of travel, 432 
torque, 189, 214 
tough integral, 56 
trace, 173, 211, 234 
trajectory, 52 
transformation, 188 
area 206 
basis 421 
composition 212 
determinant 206, 428 
electromagnetic field 425, 426, 429 
linear 192 
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Lorentz 423-424, 426 
matrix 422 

similarity 188, 219, 220, 232 
transient, 109, 110 
transport theorem, 445 
transpose, 401, 402 
trapezoidal integration, 363, 525 
triangle area, 19, 26 
triangle inequality, 171, 177 
trigonometric identities, 73, 77 
trigonometry, 1 
triple scalar product, 317 
• 

uncountable sum, 183 
unitary, 225 

• 

variance, 374, 379, 380, 381, 387, 548, 567 
variational approximation, 523 
vector space, 162-163, 176, 177, 257, 414, 446, 549 
axioms 163 
basis 168 
dimension 168 
examples 164, 171, 181 
not 166, 170 
scalar product 171 
subspace 167 
theorems 183 
vector: 

calculus, 413, 430 
derivative 285 
eigenvector 378 
field 350, 413, 414 
gradient 244, 245, 246, 260 
heat flow 332 
identities 317, 522 
operators 297 


potential 442 
unit 253, 254, 274 
volume element, 250 
volume of a sphere, 252, 274 

• 

wave, 405 

wave equation, 275, 320, 350, 381 
Weierstrass, 461 
Weierstrass-Erdmann, 533 
weird behavior, 535 
wet friction, 89 
wheel-alignment, 214 
Wiener-Khinchine theorem, 505 
wild oscillation, 372 
winding number, 477, 480 
wine cellar, 327 
work, 433 

work-energy theorem, 434 

• 

C(2), 36, 156 

zeta function, 9, 34, 59 



